Support for REINDEX CONCURRENTLY
Hi all,
One of the outputs on the discussions about the integration of pg_reorg in
core
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX
CONCURRENTLY.
The discussions done can be found on this thread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.php
Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
This basically allows to perform read and write operations on a table whose
index(es) are
reindexed at the same time. Pretty useful for a production environment. The
caveats of this
feature is that it is slower than normal reindex, and impacts other
backends with the extra CPU,
memory and IO it uses to process. The implementation is based on something
on the same ideas
as pg_reorg and on an idea of Andres.
Please find attached a version that I consider as a base for the next
discussions, perhaps
a version that could be submitted to the commit fest next month. Patch is
aligned with postgres
master at commit 09ac603.
With this feature, you can rebuild a table or an index with such commands:
REINDEX INDEX ind CONCURRENTLY;
REINDEX TABLE tab CONCURRENTLY;
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- Shared tables cannot be reindexed concurrently
- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations
Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as the
index that is rebuilt (called here old index). This new index has as name
$OLDINDEX_cct. So only a suffix _cct is added. It is marked as invalid and
not ready.
2) Take session locks on old and new index(es), and the parent table to
prevent
unfortunate drops.
3) Commit and start a new transaction
4) Wait until no running transactions could have the table open with the
old list of indexes.
5) Build the new indexes. All the new indexes are marked as indisready.
6) Commit and start a new transaction
7) Wait until no running transactions could have the table open with the
old list of indexes.
8) Take a reference snapshot and validate the new indexes
9) Wait for the old snapshots based on the reference snapshot
10) mark the new indexes as indisvalid
11) Commit and start a new transaction. At this point the old and new
indexes are both valid
12) Take a new reference snapshot and wait for the old snapshots to insure
that old
indexes are not corrupted,
13) Mark the old indexes as invalid
14) Swap new and old indexes, consisting here in switching their names.
15) Old indexes are marked as invalid.
16) Commit and start a new transaction
17) Wait for transactions that might use the old indexes
18) Old indexes are marked as not ready
19) Commit and start a new transaction
20) Drop the old indexes
The following process might be reducible, but I would like that to be
decided depending on
the community feedback and experience on such concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple
waits (perhaps sometimes unnecessary?) and subtransactions.
If during the process an error occurs, the table will finish with either
the old or new index
as invalid. In this case the user will be in charge to drop the invalid
index himself.
The concurrent index can be easily identified with its suffix *_cct.
This patch has required some refactorisation effort as I noticed that the
code of index
for concurrent operations was not very generic. In order to do that, I
created some
new functions in index.c called index_concurrent_* which are used by CREATE
INDEX
and REINDEX in my patch. Some refactoring has also been done regarding the
wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes
in indexcmds.c). The patch structure is relying a maximum on the functions
of index.c
when creating, building and validating concurrent index.
Based on the comments of this thread, I would like to submit the patch at
the next
commit fest. Just let me know if the approach taken by the current
implementation
is OK ot if it needs some modifications. That would be really helpful.
The patch includes some regression tests for error checks and also some
documentation.
Regressions are passing, code has no whitespaces and no compilation
warnings.
I have also done tests checking for read and write operations on index scan
of parent table
at each step of the process (by using gdb to stop the reindex process at
precise places).
Thanks, and looking forward to your feedback,
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20121003_reindex_concurrent.patchapplication/octet-stream; name=20121003_reindex_concurrent.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..2931329 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ] [ CONCURRENTLY ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,93 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ </refsect2>
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 464950b..ca41255 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1076,6 +1076,267 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options, // This needs to be checked
+ indexRelation->rd_index->indisprimary,
+ false, /* is constraint? */
+ false, /* is deferrable? */
+ false, /* is initially deferred? */
+ false, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true); /* concurrent? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_mark
+ *
+ * Update the pg_index row to mark the index with a new status. All the
+ * operations that can be performed on the index marking are listed in
+ * IndexMarkOperation.
+ * When a marking modification is done, the caller needs to commit current
+ * transaction as any new transactions that open the table might perform
+ * read or write operations on the table related.
+ * - INDEX_MARK_READY, index is marked as ready for inserts. When marked as
+ * ready, the index needs to be invalid.
+ * - INDEX_MARK_NOT_READY, index is marked as not ready for inserts. When
+ * marked as not ready, the index needs to be already invalid.
+ * - INDEX_MARK_VALID, index is marked as valid for selects. When marked as
+ * valid, the index needs to be ready.
+ * - INDEX_MARK_NOT_VALID, index is marked as not valid for selects, When
+ * marked as not valid, the index needs to be ready.
+ */
+void
+index_concurrent_mark(Oid indOid, IndexMarkOperation operation)
+{
+ Relation pg_index;
+ HeapTuple indexTuple;
+ Form_pg_index indexForm;
+
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ switch(operation)
+ {
+ case INDEX_MARK_READY:
+ Assert(!indexForm->indisready);
+ Assert(!indexForm->indisvalid);
+ indexForm->indisready = true;
+ break;
+
+ case INDEX_MARK_NOT_READY:
+ Assert(indexForm->indisready);
+ Assert(!indexForm->indisvalid);
+ indexForm->indisready = false;
+ break;
+
+ case INDEX_MARK_VALID:
+ Assert(indexForm->indisready);
+ Assert(!indexForm->indisvalid);
+ indexForm->indisvalid = true;
+ break;
+
+ case INDEX_MARK_NOT_VALID:
+ Assert(indexForm->indisready);
+ Assert(indexForm->indisvalid);
+ indexForm->indisvalid = false;
+ break;
+
+ default:
+ /* Do nothing */
+ break;
+ }
+
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CatalogUpdateIndexes(pg_index, indexTuple);
+
+ heap_close(pg_index, RowExclusiveLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char nameNew[NAMEDATALEN],
+ nameOld[NAMEDATALEN],
+ nameTemp[NAMEDATALEN];
+
+ /* The new index is going to use the name of the old index */
+ snprintf(nameNew, NAMEDATALEN, "%s", get_rel_name(newIndexOid));
+ snprintf(nameOld, NAMEDATALEN, "%s", get_rel_name(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ snprintf(nameTemp, NAMEDATALEN, "cct_%d", oldIndexOid);
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+}
+
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a list of indexes in a concurrent process. Deletion has to be done
+ * through performDeletion or dependencies of the index are not dropped.
+ */
+void
+index_concurrent_drop(List *indexIds)
+{
+ ListCell *lc;
+ ObjectAddresses *objects = new_object_addresses();
+
+ Assert(indexIds != NIL);
+
+ /* Scan the list of indexes and build object list */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+ ObjectAddress object;
+
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ object.objectSubId = 0;
+
+ /* Add object to list */
+ add_exact_object_address(&object, objects);
+ }
+
+ /* Perform deletion */
+ performMultipleDeletions(objects, DROP_CASCADE, 0);
+}
+
+
/*
* index_constraint_create
*
@@ -2939,6 +3200,8 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/*
* Open and lock the parent heap relation. ShareLock is sufficient since
* we only need to be sure no schema or data changes are going on.
+ * In the case of concurrent operation, a lower-level lock is taken to
+ * allow INSERT/UPDATE/DELETE operations.
*/
heapId = IndexGetRelation(indexId, false);
heapRelation = heap_open(heapId, ShareLock);
@@ -2946,6 +3209,8 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/*
* Open the target index relation and get an exclusive lock on it, to
* ensure that no one else is touching this particular index.
+ * For concurrent operation, a lower lock is taken to allow INSERT, UPDATE
+ * and DELETE operations.
*/
iRel = index_open(indexId, AccessExclusiveLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a58101e..d9fc8e4 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,12 +68,15 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
Oid relId, Oid oldRelId, void *arg);
+static void WaitForVirtualLocks(LOCKTAG heaplocktag);
+static void WaitForOldSnapshots(Snapshot snapshot);
/*
* CheckIndexCompatible
@@ -305,7 +308,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -314,16 +316,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- Relation pg_index;
- HeapTuple indexTuple;
- Form_pg_index indexForm;
- int i;
/*
* count attributes in index
@@ -449,7 +444,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -659,18 +655,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag);
/*
* At this moment we are sure that there are no transactions with the
@@ -690,50 +676,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
- PushActiveSnapshot(GetTransactionSnapshot());
-
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
+ PushActiveSnapshot(GetTransactionSnapshot());
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- pg_index = heap_open(IndexRelationId, RowExclusiveLock);
-
- indexTuple = SearchSysCacheCopy1(INDEXRELID,
- ObjectIdGetDatum(indexRelationId));
- if (!HeapTupleIsValid(indexTuple))
- elog(ERROR, "cache lookup failed for index %u", indexRelationId);
- indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
-
- Assert(!indexForm->indisready);
- Assert(!indexForm->indisvalid);
-
- indexForm->indisready = true;
-
- simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
- CatalogUpdateIndexes(pg_index, indexTuple);
-
- heap_close(pg_index, RowExclusiveLock);
+ index_concurrent_mark(indexRelationId, INDEX_MARK_READY);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -750,13 +706,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -785,105 +735,277 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
+ * transactions that might have older snapshots.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Index can now be marked valid -- update its pg_index entry
+ */
+ index_concurrent_mark(indexRelationId, INDEX_MARK_VALID);
+
+ /*
+ * The pg_index update will cause backends (including this one) to update
+ * relcache entries for the index itself, but we should also send a
+ * relcache inval on the parent table to force replanning of cached plans.
+ * Otherwise existing sessions might fail to use the new index where it
+ * would be useful. (Note that our earlier commits did not create reasons
+ * to replan; relcache flush on the index itself was sufficient.)
+ */
+ CacheInvalidateRelcacheByRelid(heaprelid.relId);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table.
+ */
+ UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+
+ return indexRelationId;
+}
+
+
+/*
+ * ReindexConcurrent
+ *
+ * Process REINDEX CONCURRENTLY for given list of indexes.
+ * Each reindexing step is done simultaneously for all the given
+ * indexes. If no list of indexes is given by the caller, all the
+ * indexes included in the relation will be reindexed.
+ */
+bool
+ReindexConcurrentIndexes(Oid heapOid, List *indexIds)
+{
+ Relation heapRelation;
+ List *concurrentIndexIds = NIL,
+ *indexLocks = NIL,
+ *realIndexIds = indexIds;
+ ListCell *lc, *lc2;
+ LockRelId heapLockId;
+ LOCKTAG heapLocktag;
+ Snapshot snapshot;
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
*
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
+ * Here begins the process for rebuilding concurrently the index.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after.
+ */
+ /* lock level used here should match index lock index_concurrent_create() */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /*
+ * If relation has a toast relation, it needs to be reindexed too,
+ * but this cannot be done concurrently.
+ */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ reindex_relation(heapRelation->rd_rel->reltoastrelid,
+ REINDEX_REL_PROCESS_TOAST);
+
+ /* Get the list of indexes from relation if caller has not given anything */
+ if (realIndexIds == NIL)
+ realIndexIds = RelationGetIndexList(heapRelation);
+
+ /* Definetely no indexes, so leave */
+ if (realIndexIds == NIL)
+ {
+ heap_close(heapRelation, NoLock);
+ return false;
+ }
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, realIndexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+
+ /* Concurrent reindex of index for exclusion constraint is not supported. */
+ if (indexRel->rd_index->indisexclusion)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for exclusion constraints")));
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(heapOid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(heapRelation, indOid, concurrentName);
+
+ /* Now open the relation of concurrent index, a lock is also needed on it */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop
+ * then close relations.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ heapLockId = heapRelation->rd_lockInfo.lockRelId;
+ SET_LOCKTAG_RELATION(heapLocktag, heapLockId.dbId, heapLockId.relId);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
*
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ LockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
+
+ /* Lock each index and each concurrent index accordingly */
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
*
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * We need to wait until no running transactions could have the table open with
+ * the old list of indexes. A concurrent build is done for each concurrent
+ * index that will replace the old indexes. All those indexes share the same
+ * snapshot and they are built in the same transaction.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
+ WaitForVirtualLocks(heapLocktag);
- for (i = 0; i < n_old_snapshots; i++)
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, realIndexIds)
{
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
+ /* Perform concurrent build of new index */
+ index_concurrent_build(heapOid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the table
+ * must insert new entries into the index for insertions and non-HOT updates.
+ */
+ index_concurrent_mark(concurrentOid, INDEX_MARK_READY);
}
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
/*
- * Index can now be marked valid -- update its pg_index entry
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
*/
- pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+ CommitTransactionCommand();
+ StartTransactionCommand();
- indexTuple = SearchSysCacheCopy1(INDEXRELID,
- ObjectIdGetDatum(indexRelationId));
- if (!HeapTupleIsValid(indexTuple))
- elog(ERROR, "cache lookup failed for index %u", indexRelationId);
- indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates.
+ */
+ WaitForVirtualLocks(heapLocktag);
- Assert(indexForm->indisready);
- Assert(!indexForm->indisvalid);
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
- indexForm->indisvalid = true;
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ validate_index(heapOid, lfirst_oid(lc), snapshot);
- simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
- CatalogUpdateIndexes(pg_index, indexTuple);
+ /*
+ * Concurrent indexes can now be marked valid -- update pg_index entries
+ */
+ foreach(lc, concurrentIndexIds)
+ index_concurrent_mark(lfirst_oid(lc), INDEX_MARK_VALID);
- heap_close(pg_index, RowExclusiveLock);
+ /*
+ * The concurrent indexes are now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
/*
- * The pg_index update will cause backends (including this one) to update
- * relcache entries for the index itself, but we should also send a
- * relcache inval on the parent table to force replanning of cached plans.
- * Otherwise existing sessions might fail to use the new index where it
- * would be useful. (Note that our earlier commits did not create reasons
- * to replan; relcache flush on the index itself was sufficient.)
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same whing
*/
- CacheInvalidateRelcacheByRelid(heaprelid.relId);
+ CacheInvalidateRelcacheByRelid(heapLockId.relId);
/* we can now do away with our active snapshot */
PopActiveSnapshot();
@@ -891,12 +1013,107 @@ DefineIndex(IndexStmt *stmt,
/* And we can remove the validating snapshot too */
UnregisterSnapshot(snapshot);
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
/*
- * Last thing to do is release the session-level lock on the parent table.
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other transactions once this transaction is committed.
*/
- UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
- return indexRelationId;
+ /* Take reference snapshot used to wait for older snapshots */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Wait for old snapshots, like previously */
+ WaitForOldSnapshots(snapshot);
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, realIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /* Mark the old index as invalid */
+ index_concurrent_mark(indOid, INDEX_MARK_NOT_VALID);
+ }
+
+ /* We can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * Commit this transaction had make old index invalidation visible.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them.
+ */
+ WaitForVirtualLocks(heapLocktag);
+
+ /* Get fresh snapshot for this step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, realIndexIds)
+ index_concurrent_mark(lfirst_oid(lc), INDEX_MARK_NOT_READY);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes.
+ */
+ index_concurrent_drop(realIndexIds);
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ UnlockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ return true;
}
@@ -1563,7 +1780,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1589,6 +1807,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1701,18 +1926,26 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
void
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
+
+ /* This is all for the non-concurrent case */
+ if (!concurrent)
+ {
+ reindex_index(indOid, false);
+ return;
+ }
- reindex_index(indOid, false);
+ /* Continue through REINDEX CONCURRENTLY */
+ ReindexConcurrentIndexes(heapOid, list_make1_oid(indOid));
}
/*
@@ -1774,18 +2007,139 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction can have the table open with the index marked as
+ * read-only for updates.
+ * To do this, inquire which xacts currently would conflict with ShareLock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the table. Then wait for each of these xacts to commit or abort.
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+static void
+WaitForVirtualLocks(LOCKTAG heaplocktag)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+static void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots;
+ int j;
+ int k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
void
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexConcurrentIndexes(heapOid, NIL))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1802,7 +2156,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
void
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1814,6 +2171,12 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /* CONCURRENTLY operation is not allowed for a database */
+ if (concurrent && do_system)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 139b1bd..72f0178 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index cebd030..e178928 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1900,6 +1900,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0d3a20d..e6ea1ba 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6607,15 +6607,16 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type qualified_name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
n->relation = $3;
n->name = NULL;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6623,9 +6624,10 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = false;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6633,6 +6635,7 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = true;
+ n->concurrent = $5;
$$ = (Node *)n;
}
;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index fde2c82..3bad1b5 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1258,15 +1258,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1278,8 +1282,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index eb417ce..5410a6c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -19,6 +19,15 @@
#define DEFAULT_INDEX_TYPE "btree"
+typedef enum IndexMarkOperation
+{
+ INDEX_MARK_VALID,
+ INDEX_MARK_NOT_VALID,
+ INDEX_MARK_READY,
+ INDEX_MARK_NOT_READY
+} IndexMarkOperation;
+
+
/* Typedef for callback function for IndexBuildHeapScan */
typedef void (*IndexBuildCallback) (Relation index,
HeapTuple htup,
@@ -52,6 +61,20 @@ extern Oid index_create(Relation heapRelation,
bool skip_build,
bool concurrent);
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_mark(Oid indOid, IndexMarkOperation operation);
+
+extern void index_concurrent_swap(Oid indexOid1, Oid indexOid2);
+
+extern void index_concurrent_drop(List *IndexIds);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 300f7ea..7cfe94d 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation);
-extern void ReindexTable(RangeVar *relation);
+extern void ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern void ReindexTable(RangeVar *relation, bool concurrent);
extern void ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexConcurrentIndexes(Oid heapOid, List *indexIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 4fe644e..a4000d3 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2510,6 +2510,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d64c9b6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,41 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int, c2 text);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1,'a');
+INSERT INTO concur_reindex_tab VALUES (2,'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY;-- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+ERROR: cannot reindex system concurrently
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer |
+ c2 | text |
+Indexes:
+ "concur_reindex_tab1" btree (c1)
+ "concur_reindex_tab2" btree (c2)
+
+DROP TABLE concur_reindex_tab;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..7b3b036 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,30 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int, c2 text);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1,'a');
+INSERT INTO concur_reindex_tab VALUES (2,'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY;-- no shared relation
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab;
On 3 October 2012 02:14, Michael Paquier <michael.paquier@gmail.com> wrote:
Well, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
Thanks
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
Fair enough
- indexes for exclusion constraints cannot be reindexed concurrently.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations
Those restrictions are important ones to resolve since they prevent
the CONCURRENTLY word from being true in a large proportion of cases.
We need to be clear that the remainder of this can be done in user
space already, so the proposal doesn't move us forwards very far,
except in terms of packaging. IMHO this needs to be more than just
moving a useful script into core.
Here is a description of what happens when reorganizing an index
concurrently
There are four waits for every index, again similar to what is
possible in user space.
When we refactor that, I would like to break things down into N
discrete steps, if possible. Each time we hit a wait barrier, a
top-level process would be able to switch to another task to avoid
waiting. This would then allow us to proceed more quickly through the
task. I would admit that is a later optimisation, but it would be
useful to have the innards refactored to allow for that more easily
later. I'd accept Not yet, if doing that becomes a problem in short
term.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Hi,
On Wednesday, October 03, 2012 03:14:17 AM Michael Paquier wrote:
One of the outputs on the discussions about the integration of pg_reorg in
core
was that Postgres should provide some ways to do REINDEX, CLUSTER and ALTER
TABLE concurrently with low-level locks in a way similar to CREATE INDEX
CONCURRENTLY.The discussions done can be found on this thread:
http://archives.postgresql.org/pgsql-hackers/2012-09/msg00746.phpWell, I spent some spare time working on the implementation of REINDEX
CONCURRENTLY.
Very cool!
This basically allows to perform read and write operations on a table whose
index(es) are reindexed at the same time. Pretty useful for a production
environment. The caveats of this feature is that it is slower than normal
reindex, and impacts other backends with the extra CPU, memory and IO it
uses to process. The implementation is based on something on the same ideas
as pg_reorg and on an idea of Andres.
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
I would like to support something like REINDEX USER TABLES; or similar at some
point, but that very well can be a second phase.
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- toast relations are reindexed non-concurrently when table reindex is done
and that this table has toast relations
Why that restriction?
Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as
the index that is rebuilt (called here old index). This new index has as
name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as
invalid and not ready.
You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that
point already, to prevent schema changes.
8) Take a reference snapshot and validate the new indexes
Hm. Unless you factor in corrupt indices, why should this be needed?
14) Swap new and old indexes, consisting here in switching their names.
I think switching based on their names is not going to work very well because
indexes are referenced by oid at several places. Swapping pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
Such a behaviour would at least be complicated for pg_depend and
pg_constraint.
The following process might be reducible, but I would like that to be
decided depending on the community feedback and experience on such
concurrent features.
For the time being I took an approach that looks slower, but secured to my
mind with multiple waits (perhaps sometimes unnecessary?) and
subtransactions.
If during the process an error occurs, the table will finish with either
the old or new index as invalid. In this case the user will be in charge to
drop the invalid index himself.
The concurrent index can be easily identified with its suffix *_cct.
I am not really happy about relying on some arbitrary naming here. That still
can result in conflicts and such.
This patch has required some refactorisation effort as I noticed that the
code of index for concurrent operations was not very generic. In order to do
that, I created some new functions in index.c called index_concurrent_*
which are used by CREATE INDEX and REINDEX in my patch. Some refactoring has
also been done regarding the> wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes in indexcmds.c). The patch structure is relying a
maximum on the functions of index.c when creating, building and validating
concurrent index.
I haven't looked at the patch yet, but I was pretty sure that you would need
to do quite some refactoring to implement this and this looks like roughly the
right direction...
Thanks, and looking forward to your feedback,
I am very happy that youre taking this on!
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
This basically allows to perform read and write operations on a table
whose
index(es) are reindexed at the same time. Pretty useful for a production
environment. The caveats of this feature is that it is slower thannormal
reindex, and impacts other backends with the extra CPU, memory and IO it
uses to process. The implementation is based on something on the sameideas
as pg_reorg and on an idea of Andres.
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.I would like to support something like REINDEX USER TABLES; or similar at
some
point, but that very well can be a second phase.
This is something out of scope for the time being honestly. Later? why
not...
- REINDEX CONCURRENTLY cannot run inside a transaction block.
- toast relations are reindexed non-concurrently when table reindex is
done
and that this table has toast relations
Why that restriction?
This is the state of the current version of the patch. And not what the
final version should do. I agree that toast relations should also be
reindexed concurrently as the others. Regarding this current restriction,
my point was just to get some feedback before digging deeper. I should have
told that though...
Here is a description of what happens when reorganizing an index
concurrently
(the beginning of the process is similar to CREATE INDEX CONCURRENTLY):
1) creation of a new index based on the same columns and restrictions as
the index that is rebuilt (called here old index). This new index has as
name $OLDINDEX_cct. So only a suffix _cct is added. It is marked as
invalid and not ready.You probably should take a SHARE UPDATE EXCLUSIVE lock on the table at that
point already, to prevent schema changes.8) Take a reference snapshot and validate the new indexes
Hm. Unless you factor in corrupt indices, why should this be needed?
14) Swap new and old indexes, consisting here in switching their names.
I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
OK, so you mean to create an index, then switch only the relfilenode. Why
not. This is largely doable. I think that what is important here is to
choose a way of doing an keep it until the end.
Such a behaviour would at least be complicated for pg_depend and
pg_constraint.The following process might be reducible, but I would like that to be
decided depending on the community feedback and experience on such
concurrent features.
For the time being I took an approach that looks slower, but secured tomy
mind with multiple waits (perhaps sometimes unnecessary?) and
subtransactions.If during the process an error occurs, the table will finish with either
the old or new index as invalid. In this case the user will be in chargeto
drop the invalid index himself.
The concurrent index can be easily identified with its suffix *_cct.I am not really happy about relying on some arbitrary naming here. That
still
can result in conflicts and such.
The concurrent names are generated automatically with a function in
indexcmds.c, the same way a pkey indexes. Let's imagine that the
reindex concurrently command is run twice after a failure. The second
concurrent index will not have *_cct as suffix but _cct1. However I am open
to more ideas here. What I feel about the concurrent index is that it needs
a pg_class entry, even if it is just temporary, and this entry needs a name.
This patch has required some refactorisation effort as I noticed that the
code of index for concurrent operations was not very generic. In orderto do
that, I created some new functions in index.c called index_concurrent_*
which are used by CREATE INDEX and REINDEX in my patch. Some refactoringhas
also been done regarding the> wait processes.
REINDEX TABLE and REINDEX INDEX follow the same code path
(ReindexConcurrentIndexes in indexcmds.c). The patch structure isrelying a
maximum on the functions of index.c when creating, building and
validating
concurrent index.
I haven't looked at the patch yet, but I was pretty sure that you would
need
to do quite some refactoring to implement this and this looks like roughly
the
right direction...
Thanks for spending time on it.
--
Michael Paquier
http://michael.otacoo.com
On 3 October 2012 09:10, Andres Freund <andres@2ndquadrant.com> wrote:
The following restrictions are applied.
- REINDEX [ DATABASE | SYSTEM ] cannot be run concurrently.
I would like to support something like REINDEX USER TABLES; or similar at some
point, but that very well can be a second phase.
Yes, that would be a nice feature anyway, even without concurrently.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.
I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.
Perhaps a good way to tackle it is to have a generic "verify two
indexes are equivalent and swap the underlying relfilenodes" operation
that can be called from both regular reindex and reindex concurrently.
As long as it's the only function that ever locks two indexes then it
can just determine what locking discipline it wants to use.
--
greg
On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old index
I don't see where the deadlock danger is hidden in that?
I didn't find anything relevant in a quick search of the archives...
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 3, 2012 at 8:08 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
Build new index.
6) process till newindex->insisready (no new locks)
validate new index
7) process till newindex->indisvalid (no new locks)
Forgot the swap old index/new index.
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exclusively which should be "invisible" now
12) drop old index
The code I sent already does that more or less btw. Just that it can be
more simplified...
I don't see where the deadlock danger is hidden in that?
I didn't find anything relevant in a quick search of the archives...
About the deadlock issues, do you mean the case where 2 sessions are
running REINDEX and/or REINDEX CONCURRENTLY on the same table or index in
parallel?
--
Michael Paquier
http://michael.otacoo.com
On Wednesday, October 03, 2012 01:15:27 PM Michael Paquier wrote:
On Wed, Oct 3, 2012 at 8:08 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On Wednesday, October 03, 2012 12:59:25 PM Greg Stark wrote:
Just for background. The showstopper for REINDEX concurrently was not
that it was particularly hard to actually do the reindexing. But it's
not obvious how to obtain a lock on both the old and new index without
creating a deadlock risk. I don't remember exactly where the deadlock
risk lies but there are two indexes to lock and whichever order you
obtain the locks it might be possible for someone else to be waiting
to obtain them in the opposite order.I'm sure it's possible to solve the problem. But the footwork needed
to release locks then reobtain them in the right order and verify that
the index hasn't changed out from under you might be a lot of
headache.Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
12) drop old indexThe code I sent already does that more or less btw. Just that it can be
more simplified...
The above just tried to describe the stuff thats relevant for locking, maybe I
wasn't clear enough on that ;)
I don't see where the deadlock danger is hidden in that?
I didn't find anything relevant in a quick search of the archives...About the deadlock issues, do you mean the case where 2 sessions are
running REINDEX and/or REINDEX CONCURRENTLY on the same table or index in
parallel?
No idea. The bit about deadlocks originally came from Greg, not me ;)
I guess its more the interaction with normal sessions, because the locking
used (SHARE UPDATE EXLUSIVE) prevents another CONCURRENT action running at the
same time. I don't really see the danger there though because we should never
need to acquire locks that we don't already have except the final
AccessExclusiveLock but thats after we dropped other locks and after the index
is made unusable.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old index
You can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.
regards, tom lane
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.
Yea, the session lock on the table itself probably shouldn't be dropped. If
were holding only that one there shouldn't be any additional deadlock dangers
when dropping the index due to lock upgrades as were doing the normal dance
any DROP INDEX does. They seem pretty unlikely in a !valid !ready table
anyway.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be dropped. If
were holding only that one there shouldn't be any additional deadlock dangers
when dropping the index due to lock upgrades as were doing the normal dance
any DROP INDEX does. They seem pretty unlikely in a !valid !ready table
Just à note...
My patch drops the locks on parent table and indexes at the end of process, after dropping the old indexes ;)
Michael
Show quoted text
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be dropped.
If were holding only that one there shouldn't be any additional deadlock
dangers when dropping the index due to lock upgrades as were doing the
normal dance any DROP INDEX does. They seem pretty unlikely in a !valid
!ready tableJust à note...
My patch drops the locks on parent table and indexes at the end of process,
after dropping the old indexes ;)
I think that might result in deadlocks with concurrent sessions in some
circumstances if those other sessions already have a lower level lock on the
index. Thats why I think dropping the lock on the index and then reacquiring
an access exlusive might be necessary.
Its not a too likely scenario, but why not do it right if its just 3 lines...
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On 2012/10/04, at 5:41, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be dropped.
If were holding only that one there shouldn't be any additional deadlock
dangers when dropping the index due to lock upgrades as were doing the
normal dance any DROP INDEX does. They seem pretty unlikely in a !valid
!ready tableJust à note...
My patch drops the locks on parent table and indexes at the end of process,
after dropping the old indexes ;)I think that might result in deadlocks with concurrent sessions in some
circumstances if those other sessions already have a lower level lock on the
index. Thats why I think dropping the lock on the index and then reacquiring
an access exlusive might be necessary.
Its not a too likely scenario, but why not do it right if its just 3 lines...
Tom is right. This scenario does not cover the case where you drop the parent table or you drop the index, which is indeed invisible, but still has a pg_class and a pg_index entry, from a different session after step 10 and before step 11. So you cannot either drop the locks on indexes until you are done at step 12.
Show quoted text
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wednesday, October 03, 2012 11:42:25 PM Michael Paquier wrote:
On 2012/10/04, at 5:41, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 10:12:58 PM Michael Paquier wrote:
On 2012/10/03, at 23:52, Andres Freund <andres@2ndquadrant.com> wrote:
On Wednesday, October 03, 2012 04:28:59 PM Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Maybe I am missing something here, but reindex concurrently should do
1) BEGIN
2) Lock table in share update exlusive
3) lock old index
3) create new index
4) obtain session locks on table, old index, new index
5) commit
6) process till newindex->insisready (no new locks)
7) process till newindex->indisvalid (no new locks)
8) process till !oldindex->indisvalid (no new locks)
9) process till !oldindex->indisready (no new locks)
10) drop all session locks
11) lock old index exlusively which should be "invisible" now
12) drop old indexYou can't drop the session locks until you're done. Consider somebody
else trying to do a DROP TABLE between steps 10 and 11, for instance.Yea, the session lock on the table itself probably shouldn't be
dropped. If were holding only that one there shouldn't be any
additional deadlock dangers when dropping the index due to lock
upgrades as were doing the normal dance any DROP INDEX does. They seem
pretty unlikely in a !valid !ready tableJust à note...
My patch drops the locks on parent table and indexes at the end of
process, after dropping the old indexes ;)I think that might result in deadlocks with concurrent sessions in some
circumstances if those other sessions already have a lower level lock on
the index. Thats why I think dropping the lock on the index and then
reacquiring an access exlusive might be necessary.
Its not a too likely scenario, but why not do it right if its just 3
lines...Tom is right. This scenario does not cover the case where you drop the
parent table or you drop the index, which is indeed invisible, but still
has a pg_class and a pg_index entry, from a different session after step
10 and before step 11. So you cannot either drop the locks on indexes
until you are done at step 12.
Yep:
Yea, the session lock on the table itself probably shouldn't be dropped.
But that does *not* mean you cannot avoid lock upgrade issues by dropping the
lower level lock on the index first and only then acquiring the access exlusive
lock. Note that dropping an index always includes *first* getting a lock on the
table so doing it that way is safe and just the same as a normal drop index.
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
14) Swap new and old indexes, consisting here in switching their names.
I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
OK, if there is a choice to be made, switching relfilenode would be a
better choice as it points to the physical storage itself. It looks more
straight-forward than switching oids, and takes the switch at the root.
Btw, there is still something I wanted to clarify. You mention in your
ideas "old" and "new" indexes.
Such as we create a new index at the begininning and drop the old one at
the end. This is not completely true in the case of switching relfilenode.
What happens is that we create a new index with a new physical storage,
then at swap step, we switch the old storage and the new storage. Once swap
is done, the index that needs to be set as invalid and not ready is not the
old index. but the index created at the beginning of process that has now
the old relfilenode. Then the relation that is indeed dropped at the end of
process is also the index with the old relfilenode, so the index created
indeed at the beginning of process. I understand that this is playing with
the words, but I just wanted to confirm that we are on the same line.
--
Michael Paquier
http://michael.otacoo.com
Michael Paquier <michael.paquier@gmail.com> writes:
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
14) Swap new and old indexes, consisting here in switching their names.I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.
OK, if there is a choice to be made, switching relfilenode would be a
better choice as it points to the physical storage itself. It looks more
straight-forward than switching oids, and takes the switch at the root.
Andres is quite right that "switch by name" is out of the question ---
for the most part, the system pays no attention to index names at all.
It just gets a list of the OIDs of indexes belonging to a table and
works with that.
However, I'm pretty suspicious of the idea of switching relfilenodes as
well. You generally can't change the relfilenode of a relation (either
a table or an index) without taking an exclusive lock on it, because
changing the relfilenode *will* break any concurrent operations on the
index. And there is not anyplace in the proposed sequence where it's
okay to have exclusive lock on both indexes, at least not if the goal
is to not block concurrent updates at any time.
I think what you'd have to do is drop the old index (relying on the
assumption that no one is accessing it anymore after a certain point, so
you can take exclusive lock on it now) and then rename the new index
to have the old index's name. However, renaming an index without
exclusive lock on it still seems a bit risky. Moreover, what if you
crash right after committing the drop of the old index?
I'm really not convinced that we have a bulletproof solution yet,
at least not if you insist on the replacement index having the same name
as the original. How badly do we need that?
regards, tom lane
On 2012/10/04, at 10:00, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Wed, Oct 3, 2012 at 5:10 PM, Andres Freund <andres@2ndquadrant.com>wrote:
14) Swap new and old indexes, consisting here in switching their names.I think switching based on their names is not going to work very well
because
indexes are referenced by oid at several places. Swapping
pg_index.indexrelid
or pg_class.relfilenode seems to be the better choice to me. We expect
relfilenode changes for such commands, but not ::regclass oid changes.OK, if there is a choice to be made, switching relfilenode would be a
better choice as it points to the physical storage itself. It looks more
straight-forward than switching oids, and takes the switch at the root.Andres is quite right that "switch by name" is out of the question ---
for the most part, the system pays no attention to index names at all.
It just gets a list of the OIDs of indexes belonging to a table and
works with that.
Sure. The switching being done by changing the index name is just the direction taken by the first version of the patch, and only that. I just wrote this version without really looking for a bulletproof solution but only for something to discuss about.
However, I'm pretty suspicious of the idea of switching relfilenodes as
well. You generally can't change the relfilenode of a relation (either
a table or an index) without taking an exclusive lock on it, because
changing the relfilenode *will* break any concurrent operations on the
index. And there is not anyplace in the proposed sequence where it's
okay to have exclusive lock on both indexes, at least not if the goal
is to not block concurrent updates at any time.
Ok. As the goal is to allow concurrent operations, this is not reliable either. So what is remaining is the method switching the OIDs of old and new indexes in pg_index? Any other candidates?
I think what you'd have to do is drop the old index (relying on the
assumption that no one is accessing it anymore after a certain point, so
you can take exclusive lock on it now) and then rename the new index
to have the old index's name. However, renaming an index without
exclusive lock on it still seems a bit risky. Moreover, what if you
crash right after committing the drop of the old index?I'm really not convinced that we have a bulletproof solution yet,
at least not if you insist on the replacement index having the same name as the original. How badly do we need that?
And we do not really need such a solution as I am not insisting on the method that switches indexes by changing names. I am open to a reliable and robust method, and I hope this method could be decided in this thread.
Thanks for those arguments, I am feeling it is really leading the discussion to the good direction.
Thanks.
Michael
On Thu, Oct 4, 2012 at 2:19 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I think what you'd have to do is drop the old index (relying on the
assumption that no one is accessing it anymore after a certain point, so
you can take exclusive lock on it now) and then rename the new index
to have the old index's name. However, renaming an index without
exclusive lock on it still seems a bit risky. Moreover, what if you
crash right after committing the drop of the old index?
I think this would require a new state which is the converse of
indisvalid=f. Right now there's no state the index can be in that
means the index should be ignored for both scans and maintenance but
might have old sessions that might be using it or maintaining it.
I'm a bit puzzled why we're so afraid of swapping the relfilenodes
when that's what the current REINDEX does. It seems flaky to have two
different mechanisms depending on which mode is being used. It seems
more conservative to use the same mechanism and just figure out what's
required to ensure it's safe in both modes. At least there won't be
any bugs from unexpected consequences that aren't locking related if
it's using the same mechanics.
--
greg
Greg Stark <stark@mit.edu> writes:
I'm a bit puzzled why we're so afraid of swapping the relfilenodes
when that's what the current REINDEX does.
Swapping the relfilenodes is fine *as long as you have exclusive lock*.
The trick is to make it safe without that. It will definitely not work
to do that without exclusive lock, because at the instant you would try
it, people will be accessing the new index (by OID).
regards, tom lane
On Thu, Oct 4, 2012 at 11:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Greg Stark <stark@mit.edu> writes:
I'm a bit puzzled why we're so afraid of swapping the relfilenodes
when that's what the current REINDEX does.Swapping the relfilenodes is fine *as long as you have exclusive lock*.
The trick is to make it safe without that. It will definitely not work
to do that without exclusive lock, because at the instant you would try
it, people will be accessing the new index (by OID).
OK, so index swapping could be done by:
1) Index name switch. This is not thought as safe as the system does not
pay attention on index names at all.
2) relfilenode switch. An ExclusiveLock is necessary.The lock that would be
taken is not compatible with a concurrent operation, except if we consider
that the lock will not be taken for a long time, only during the swap
moment. Reindex uses this mechanism, so it would be good for consistency.
3) Switch the OIDs of indexes. Looks safe from the system prospective and
it will be necessary to invalidate the cache entries for both relations
after swap. Any opinions on this one?
--
Michael Paquier
http://michael.otacoo.com
On Thursday, October 04, 2012 04:51:29 AM Tom Lane wrote:
Greg Stark <stark@mit.edu> writes:
I'm a bit puzzled why we're so afraid of swapping the relfilenodes
when that's what the current REINDEX does.Swapping the relfilenodes is fine *as long as you have exclusive lock*.
The trick is to make it safe without that. It will definitely not work
to do that without exclusive lock, because at the instant you would try
it, people will be accessing the new index (by OID).
I can understand hesitation around that.. I would like to make sure I
understand the problem correctly. When we get to the point where we switch
indexes we should be in the following state:
- both indexes are indisready
- old should be invalid
- new index should be valid
- have the same indcheckxmin
- be locked by us preventing anybody else from making changes
Lets assume we have index a_old(relfilenode 1) as the old index and a rebuilt
index a_new (relfilenode 2) as the one we just built. If we do it properly
nobody will have 'a' open for querying, just for modifications (its indisready)
as we had waited for everyone that could have seen a as valid to finish.
As far as I understand the code a session using a_new will also have built a
relcache entry for a_old.
Two problems:
* relying on the relcache to be built for both indexes seems hinky
* As the relcache is built with SnapshotNow it could read the old definition
for a_new and the new one for a_old (or the reverse) and thus end up with both
pointing to the same relfilenode. Which would be ungood.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 5, 2012 at 6:58 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On Thursday, October 04, 2012 04:51:29 AM Tom Lane wrote:
I can understand hesitation around that.. I would like to make sure I
understand the problem correctly. When we get to the point where we switch
indexes we should be in the following state:
- both indexes are indisready
- old should be invalid
- new index should be valid
- have the same indcheckxmin
- be locked by us preventing anybody else from making changes
Looks like a good presentation of the problem. I am not sure if marking the
new index as valid is necessary though. As long as it is done inside the
same transaction as the swap there are no problems, no?
Lets assume we have index a_old(relfilenode 1) as the old index and a
rebuilt
index a_new (relfilenode 2) as the one we just built. If we do it properly
nobody will have 'a' open for querying, just for modifications (its
indisready)
as we had waited for everyone that could have seen a as valid to finish.As far as I understand the code a session using a_new will also have built
a
relcache entry for a_old.
Two problems:
* relying on the relcache to be built for both indexes seems hinky
* As the relcache is built with SnapshotNow it could read the old
definition
for a_new and the new one for a_old (or the reverse) and thus end up with
both
pointing to the same relfilenode. Which would be ungood.
OK, so the problem here is that the relcache, as the syscache, are relying
on SnapshotNow which cannot be used safely as the false index definition
could be read by other backends. So this looks to bring back the discussion
to the point where a higher lock level is necessary to perform a safe
switch of the indexes.
I assume that the switch phase is not the longest phase of the concurrent
operation, as you also need to build and validate the new index at prior
steps. I am just wondering if it is acceptable to you guys to take a
stronger lock only during this switch phase. This won't make the reindex
being concurrently all the time but it would avoid any visibility issues
and have an index switch processing which is more consistent with the
existing implementation as it could rely on the same mechanism as normal
reindex that switches relfilenode.
--
Michael Paquier
http://michael.otacoo.com
Michael Paquier <michael.paquier@gmail.com> writes:
OK, so the problem here is that the relcache, as the syscache, are relying
on SnapshotNow which cannot be used safely as the false index definition
could be read by other backends.
That's one problem. It's definitely not the only one, if we're trying
to change an index's definition while an index-accessing operation is in
progress.
I assume that the switch phase is not the longest phase of the concurrent
operation, as you also need to build and validate the new index at prior
steps. I am just wondering if it is acceptable to you guys to take a
stronger lock only during this switch phase.
We might be forced to fall back on such a solution, but it's pretty
undesirable. Even though the exclusive lock would only need to be held
for a short time, it can create a big hiccup in processing. The key
reason is that once the ex-lock request is queued, it blocks ordinary
operations coming in behind it. So effectively it's stopping operations
not just for the length of time the lock is *held*, but for the length
of time it's *awaited*, which could be quite long.
Note that allowing subsequent requests to jump the queue would not be a
good fix for this; if you do that, it's likely the ex-lock will never be
granted, at least not till the next system idle time. Which if you've
got one, you don't need a feature like this at all; you might as well
just reindex normally during your idle time.
regards, tom lane
Tom Lane escribió:
Note that allowing subsequent requests to jump the queue would not be a
good fix for this; if you do that, it's likely the ex-lock will never be
granted, at least not till the next system idle time. Which if you've
got one, you don't need a feature like this at all; you might as well
just reindex normally during your idle time.
Not really. The time to run a complete reindex might be several hours.
If the idle time is just a few minutes or seconds long, it may be more
than enough to complete the switch operation, but not to run the
complete reindex.
Maybe another idea is that the reindexing is staged: the user would
first run a command to create the replacement index, and leave both
present until the user runs a second command (which acquires a strong
lock) that executes the switch. Somehow similar to a constraint created
as NOT VALID (which runs without a strong lock) which can be later
validated separately.
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Maybe another idea is that the reindexing is staged: the user would
first run a command to create the replacement index, and leave both
present until the user runs a second command (which acquires a strong
lock) that executes the switch. Somehow similar to a constraint created
as NOT VALID (which runs without a strong lock) which can be later
validated separately.
Yeah. We could consider
CREATE INDEX CONCURRENTLY (already exists)
SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies;
or maybe just implement as swap of relfilenodes?)
DROP INDEX CONCURRENTLY
The last might have some usefulness in its own right, anyway.
regards, tom lane
On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
Maybe another idea is that the reindexing is staged: the user would
first run a command to create the replacement index, and leave both
present until the user runs a second command (which acquires a strong
lock) that executes the switch. Somehow similar to a constraint created
as NOT VALID (which runs without a strong lock) which can be later
validated separately.Yeah. We could consider
CREATE INDEX CONCURRENTLY (already exists)
SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies;
or maybe just implement as swap of relfilenodes?)
DROP INDEX CONCURRENTLY
OK. That is a different approach and would limit strictly the amount of
code necessary for the feature, but I feel that it breaks the nature of
CONCURRENTLY which should run without any exclusive locks. The possibility
to do that in a single command would be also better perhaps seen from the
user.
Until now all the approaches investigated (switch of relfilenode, switch of
index OID) need to have an exclusive lock because we try to maintain index
OID as consistent. In the patch I submitted, the new index created has a
different OID than the old index, and simply switches names. So after the
REINDEX CONCURRENTLY the OID of index on the table is different, but seen
from user the name is the same. Is it acceptable to consider that a reindex
concurrently could change the OID of the index rebuild? Is it a postgres
requirement to maintain the object OIDs consistent between DDL operations?
If the OID of old and new index are different, the relcache entries of each
index will be completely separated, and this would take care of any
visibility problems regarding visibility. pg_reorg for example changes the
relation OID of the table reorganized after operation is completed.
Thoughts about that?
--
Michael Paquier
http://michael.otacoo.com
Michael Paquier <michael.paquier@gmail.com> writes:
On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
CREATE INDEX CONCURRENTLY (already exists)
SWAP INDEXES (requires ex-lock, swaps names and constraint dependencies;
or maybe just implement as swap of relfilenodes?)
DROP INDEX CONCURRENTLY
OK. That is a different approach and would limit strictly the amount of
code necessary for the feature, but I feel that it breaks the nature of
CONCURRENTLY which should run without any exclusive locks.
Hm? The whole point is that the CONCURRENTLY commands don't require
exclusive locks. Only the SWAP command would.
Until now all the approaches investigated (switch of relfilenode, switch of
index OID) need to have an exclusive lock because we try to maintain index
OID as consistent. In the patch I submitted, the new index created has a
different OID than the old index, and simply switches names. So after the
REINDEX CONCURRENTLY the OID of index on the table is different, but seen
from user the name is the same. Is it acceptable to consider that a reindex
concurrently could change the OID of the index rebuild?
That is not going to work without ex-lock somewhere. If you change the
index's OID then you will have to change pg_constraint and pg_depend
entries referencing it, and that creates race condition hazards for
other processes looking at those catalogs. I'm not convinced that you
can even do a rename safely without ex-lock. Basically, any DDL update
on an active index is going to be dangerous and probably impossible
without lock, IMO.
To answer your question, I don't think anyone would object to the
index's OID changing if the operation were safe otherwise. But I don't
think that allowing that gets us to a safe solution.
regards, tom lane
On Sat, Oct 6, 2012 at 8:40 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Michael Paquier <michael.paquier@gmail.com> writes:
On Sat, Oct 6, 2012 at 6:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
OK. That is a different approach and would limit strictly the amount of
code necessary for the feature, but I feel that it breaks the nature of
CONCURRENTLY which should run without any exclusive locks.Hm? The whole point is that the CONCURRENTLY commands don't require
exclusive locks. Only the SWAP command would.
Yes, but my point is that it is more user-friendly to have such a
functionality with a single command.
By having something without locks, you could use the concurrent APIs to
perform a REINDEX automatically in autovacuum for example.
Also, the possibility to perform concurrent operations entirely without
exclusive locks is not a problem limited to REINDEX, there would be for
sure similar problems if CLUSTER CONCURRENTLY or ALTER TABLE CONCURRENTLY
are wanted.
Until now all the approaches investigated (switch of relfilenode, switch
of
index OID) need to have an exclusive lock because we try to maintain
index
OID as consistent. In the patch I submitted, the new index created has a
different OID than the old index, and simply switches names. So after the
REINDEX CONCURRENTLY the OID of index on the table is different, but seen
from user the name is the same. Is it acceptable to consider that areindex
concurrently could change the OID of the index rebuild?
That is not going to work without ex-lock somewhere. If you change the
index's OID then you will have to change pg_constraint and pg_depend
entries referencing it, and that creates race condition hazards for
other processes looking at those catalogs. I'm not convinced that you
can even do a rename safely without ex-lock. Basically, any DDL update
on an active index is going to be dangerous and probably impossible
without lock, IMO.
In the current version of the patch, at the beginning of process a new
index is created. It is a twin of the index it has to replace, meaning that
it copies the dependencies of old index and creates twin entries of the old
index even in pg_depend and pg_constraint also if necessary. So the old
index and the new index have exactly the same data in catalog, they are
completely decoupled, and you do not need to worry about the OID
replacements and the visibility consequences.
Knowing that both indexes are completely separate entities, isn't this
enough to change the new index as the old one with a low-level lock? In the
case of my patch only the names are simply exchanged and make the user
unaware of what is happening in background. This behaves similarly to
pg_reorg, explaining why the OIDs of tables reorganized are changed after
being pg_reorg'ed.
To answer your question, I don't think anyone would object to the
index's OID changing if the operation were safe otherwise. But I don't
think that allowing that gets us to a safe solution.
OK thanks.
--
Michael Paquier
http://michael.otacoo.com
On 10/05/2012 09:03 PM, Tom Lane wrote:
Note that allowing subsequent requests to jump the queue would not be a
good fix for this; if you do that, it's likely the ex-lock will never be
granted, at least not till the next system idle time.
Offering that option to the admin sounds like a good thing, since
(as Alvaro points out) the build of the replacement index could take
considerable time but be done without the lock. Then the swap
done in the first quiet period (but without further admin action),
and the drop started.
One size doesn't fit all. It doesn't need to be the only method.
--
Cheers,
Jeremy
On 10/5/12 9:57 PM, Michael Paquier wrote:
In the current version of the patch, at the beginning of process a new index is created. It is a twin of the index it has to replace, meaning that it copies the dependencies of old index and creates twin entries of the old index even in pg_depend and pg_constraint also if necessary. So the old index and the new index have exactly the same data in catalog, they are completely decoupled, and you do not need to worry about the OID replacements and the visibility consequences.
Yeah, what's the risk to renaming an index during concurrent access? The only thing I can think of is an "old" backend referring to the wrong index name in an elog. That's certainly not great, but could possibly be dealt with.
Are there any other things that are directly tied to the name of an index (or of any object for that matter)?
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
On Monday, October 08, 2012 11:57:46 PM Jim Nasby wrote:
On 10/5/12 9:57 PM, Michael Paquier wrote:
In the current version of the patch, at the beginning of process a new
index is created. It is a twin of the index it has to replace, meaning
that it copies the dependencies of old index and creates twin entries of
the old index even in pg_depend and pg_constraint also if necessary. So
the old index and the new index have exactly the same data in catalog,
they are completely decoupled, and you do not need to worry about the
OID replacements and the visibility consequences.Yeah, what's the risk to renaming an index during concurrent access? The
only thing I can think of is an "old" backend referring to the wrong index
name in an elog. That's certainly not great, but could possibly be dealt
with.
We cannot have two indexes with the same oid in the catalog, so the two
different names will have to have different oids. Unfortunately the indexes oid
is referred to by other tables (e.g. pg_constraint), so renaming the indexes
while differering in the oid isn't really helpful :(...
Right now I don't see anything that would make switching oids easier than
relfilenodes.
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Jim Nasby <jim@nasby.net> writes:
Yeah, what's the risk to renaming an index during concurrent access?
SnapshotNow searches for the pg_class row could get broken by *any*
transactional update of that row, whether it's for a change of relname
or some other field.
A lot of these problems would go away if we rejiggered the definition of
SnapshotNow to be more like MVCC. We have discussed that in the past,
but IIRC it's not exactly a simple or risk-free change in itself.
Still, maybe we should start thinking about doing that instead of trying
to make REINDEX CONCURRENTLY safe given the existing infrastructure.
regards, tom lane
On 10/8/12 5:08 PM, Andres Freund wrote:
On Monday, October 08, 2012 11:57:46 PM Jim Nasby wrote:
On 10/5/12 9:57 PM, Michael Paquier wrote:
In the current version of the patch, at the beginning of process a new
index is created. It is a twin of the index it has to replace, meaning
that it copies the dependencies of old index and creates twin entries of
the old index even in pg_depend and pg_constraint also if necessary. So
the old index and the new index have exactly the same data in catalog,
they are completely decoupled, and you do not need to worry about the
OID replacements and the visibility consequences.Yeah, what's the risk to renaming an index during concurrent access? The
only thing I can think of is an "old" backend referring to the wrong index
name in an elog. That's certainly not great, but could possibly be dealt
with.We cannot have two indexes with the same oid in the catalog, so the two
different names will have to have different oids. Unfortunately the indexes oid
is referred to by other tables (e.g. pg_constraint), so renaming the indexes
while differering in the oid isn't really helpful :(...
Hrm... the claim was made that everything relating to the index, including pg_depend and pg_contstraint, got duplicated. But I don't know how you could duplicate a constraint without also playing name games. Perhaps name games are being played there as well...
Right now I don't see anything that would make switching oids easier than
relfilenodes.
Yeah... in order to make either of those schemes work I think there would need to non-trivial internal changes so that we weren't just passing around raw OIDs/filenodes.
BTW, it occurs to me that this problem might be easier to deal with if we had support for accessing the catalog with the same snapshot as the main query was using... IIRC that's been discussed in the past for other issues.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
On 10/8/12 6:12 PM, Tom Lane wrote:
Jim Nasby <jim@nasby.net> writes:
Yeah, what's the risk to renaming an index during concurrent access?
SnapshotNow searches for the pg_class row could get broken by *any*
transactional update of that row, whether it's for a change of relname
or some other field.A lot of these problems would go away if we rejiggered the definition of
SnapshotNow to be more like MVCC. We have discussed that in the past,
but IIRC it's not exactly a simple or risk-free change in itself.
Still, maybe we should start thinking about doing that instead of trying
to make REINDEX CONCURRENTLY safe given the existing infrastructure.
Yeah, I was just trying to remember what other situations this has come up in. My recollection is that there's been a couple other cases where that would be useful.
My recollection is also that such a change would be rather large... but it might be smaller than all the other work-arounds that are needed because we don't have that...
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net
On Tue, Oct 9, 2012 at 8:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jim Nasby <jim@nasby.net> writes:
Yeah, what's the risk to renaming an index during concurrent access?
SnapshotNow searches for the pg_class row could get broken by *any*
transactional update of that row, whether it's for a change of relname
or some other field.
Does it include updates on the relation names of pg_class, or ready and
valid flags in pg_index? Tables refer to the indexes with OIDs only so if
the index and its concurrent are completely separated entries in pg_index,
pg_constraint and pg_class, what is the problem?
Is it that the Relation fetched from system cache might become inconsistent
because of SnapshotNow?
A lot of these problems would go away if we rejiggered the definition of
SnapshotNow to be more like MVCC. We have discussed that in the past,
but IIRC it's not exactly a simple or risk-free change in itself.
Still, maybe we should start thinking about doing that instead of trying
to make REINDEX CONCURRENTLY safe given the existing infrastructure.
+1. This is something to dig if operations like OID switch are envisaged
for concurrent operations. This does not concern only REINDEX. Things like
CLUSTER, or ALTER TABLE would need something similar.
--
Michael Paquier
http://michael.otacoo.com
On Tue, Oct 9, 2012 at 8:14 AM, Jim Nasby <jim@nasby.net> wrote:
Hrm... the claim was made that everything relating to the index, including
pg_depend and pg_contstraint, got duplicated. But I don't know how you
could duplicate a constraint without also playing name games. Perhaps name
games are being played there as well...
Yes, it is what was originally intended. Please note the pg_constraint
entry was not duplicated correctly in the first version of the patch
because of a bug I already fixed.
I will provide another version soon if necessary.
Right now I don't see anything that would make switching oids easier than
relfilenodes.Yeah... in order to make either of those schemes work I think there would
need to non-trivial internal changes so that we weren't just passing around
raw OIDs/filenodes.BTW, it occurs to me that this problem might be easier to deal with if we
had support for accessing the catalog with the same snapshot as the main
query was using... IIRC that's been discussed in the past for other issues.
Yes, it would be better and helpful to have such a mechanism even for other
operations.
--
Michael Paquier
http://michael.otacoo.com
* Jim Nasby (jim@nasby.net) wrote:
Yeah, I was just trying to remember what other situations this has come up in. My recollection is that there's been a couple other cases where that would be useful.
Yes, I've run into similar issues in the past also. It'd be really neat
to somehow make the SnapshotNow (and I'm guessing the whole SysCache
system) behave more like MVCC.
My recollection is also that such a change would be rather large... but it might be smaller than all the other work-arounds that are needed because we don't have that...
Perhaps.. Seems like it'd be a lot of work tho, to do it 'right', and I
suspect there's a lot of skeletons out there that we'd run into..
Thanks,
Stephen
Hi all,
Please find attached the version 2 of the patch for this feature, it
corrects the following things:
- toast relations are now rebuilt concurrently as well as other indexes
- concurrent constraint indexes (PRIMARY KEY, UNIQUE, EXCLUSION) are
dropped correctly at the end of process
- exclusion constraints are supported, at least it looks to work correctly.
- Fixed a couple of bugs when constraint indexes were involved in process.
I am adding this version to the commit fest of next month for review.
Regards,
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20121012_reindex_concurrent_v2.patchapplication/octet-stream; name=20121012_reindex_concurrent_v2.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..2931329 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ] [ CONCURRENTLY ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,93 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ </refsect2>
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 464950b..03be2b3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -664,6 +664,10 @@ UpdateIndexRelation(Oid indexoid,
* concurrent: if true, do not lock the table against writers. The index
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -686,7 +690,8 @@ index_create(Relation heapRelation,
bool initdeferred,
bool allow_system_table_mods,
bool skip_build,
- bool concurrent)
+ bool concurrent,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -722,26 +727,31 @@ index_create(Relation heapRelation,
if (!allow_system_table_mods &&
IsSystemRelation(heapRelation) &&
- IsNormalProcessingMode())
+ IsNormalProcessingMode() &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("user-defined indexes on system catalog tables are not supported")));
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1076,6 +1086,311 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_mark
+ *
+ * Update the pg_index row to mark the index with a new status. All the
+ * operations that can be performed on the index marking are listed in
+ * IndexMarkOperation.
+ * When a marking modification is done, the caller needs to commit current
+ * transaction as any new transactions that open the table might perform
+ * read or write operations on the table related.
+ * - INDEX_MARK_READY, index is marked as ready for inserts. When marked as
+ * ready, the index needs to be invalid.
+ * - INDEX_MARK_NOT_READY, index is marked as not ready for inserts. When
+ * marked as not ready, the index needs to be already invalid.
+ * - INDEX_MARK_VALID, index is marked as valid for selects. When marked as
+ * valid, the index needs to be ready.
+ * - INDEX_MARK_NOT_VALID, index is marked as not valid for selects, When
+ * marked as not valid, the index needs to be ready.
+ */
+void
+index_concurrent_mark(Oid indOid, IndexMarkOperation operation)
+{
+ Relation pg_index;
+ HeapTuple indexTuple;
+ Form_pg_index indexForm;
+
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ switch(operation)
+ {
+ case INDEX_MARK_READY:
+ Assert(!indexForm->indisready);
+ Assert(!indexForm->indisvalid);
+ indexForm->indisready = true;
+ break;
+
+ case INDEX_MARK_NOT_READY:
+ Assert(indexForm->indisready);
+ Assert(!indexForm->indisvalid);
+ indexForm->indisready = false;
+ break;
+
+ case INDEX_MARK_VALID:
+ Assert(indexForm->indisready);
+ Assert(!indexForm->indisvalid);
+ indexForm->indisvalid = true;
+ break;
+
+ case INDEX_MARK_NOT_VALID:
+ Assert(indexForm->indisready);
+ Assert(indexForm->indisvalid);
+ indexForm->indisvalid = false;
+ break;
+
+ default:
+ /* Do nothing */
+ break;
+ }
+
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CatalogUpdateIndexes(pg_index, indexTuple);
+
+ heap_close(pg_index, RowExclusiveLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char nameNew[NAMEDATALEN],
+ nameOld[NAMEDATALEN],
+ nameTemp[NAMEDATALEN];
+
+ /* The new index is going to use the name of the old index */
+ snprintf(nameNew, NAMEDATALEN, "%s", get_rel_name(newIndexOid));
+ snprintf(nameOld, NAMEDATALEN, "%s", get_rel_name(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ snprintf(nameTemp, NAMEDATALEN, "cct_%d", oldIndexOid);
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+}
+
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a list of indexes in a concurrent process. Deletion has to be done
+ * through performDeletion or dependencies of the index are not dropped.
+ */
+void
+index_concurrent_drop(List *indexIds)
+{
+ ListCell *lc;
+ ObjectAddresses *objects = new_object_addresses();
+
+ Assert(indexIds != NIL);
+
+ /* Scan the list of indexes and build object list for normal indexes */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+
+ /* Register constraint or index for drop */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Add object to list */
+ add_exact_object_address(&object, objects);
+ }
+
+ /* Perform deletion for normal indexes */
+ performMultipleDeletions(objects, DROP_CASCADE, 0);
+}
+
+
/*
* index_constraint_create
*
@@ -2939,6 +3254,8 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/*
* Open and lock the parent heap relation. ShareLock is sufficient since
* we only need to be sure no schema or data changes are going on.
+ * In the case of concurrent operation, a lower-level lock is taken to
+ * allow INSERT/UPDATE/DELETE operations.
*/
heapId = IndexGetRelation(indexId, false);
heapRelation = heap_open(heapId, ShareLock);
@@ -2946,6 +3263,8 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/*
* Open the target index relation and get an exclusive lock on it, to
* ensure that no one else is touching this particular index.
+ * For concurrent operation, a lower lock is taken to allow INSERT, UPDATE
+ * and DELETE operations.
*/
iRel = index_open(indexId, AccessExclusiveLock);
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 1feffd2..5a5ecde 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -279,7 +279,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false);
+ true, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index a58101e..ed1f262 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,12 +68,15 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
Oid relId, Oid oldRelId, void *arg);
+static void WaitForVirtualLocks(LOCKTAG heaplocktag);
+static void WaitForOldSnapshots(Snapshot snapshot);
/*
* CheckIndexCompatible
@@ -305,7 +308,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -314,16 +316,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- Relation pg_index;
- HeapTuple indexTuple;
- Form_pg_index indexForm;
- int i;
/*
* count attributes in index
@@ -449,7 +444,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -596,7 +592,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent);
+ stmt->concurrent, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -659,18 +655,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag);
/*
* At this moment we are sure that there are no transactions with the
@@ -690,50 +676,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- pg_index = heap_open(IndexRelationId, RowExclusiveLock);
-
- indexTuple = SearchSysCacheCopy1(INDEXRELID,
- ObjectIdGetDatum(indexRelationId));
- if (!HeapTupleIsValid(indexTuple))
- elog(ERROR, "cache lookup failed for index %u", indexRelationId);
- indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
-
- Assert(!indexForm->indisready);
- Assert(!indexForm->indisvalid);
-
- indexForm->indisready = true;
-
- simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
- CatalogUpdateIndexes(pg_index, indexTuple);
-
- heap_close(pg_index, RowExclusiveLock);
+ index_concurrent_mark(indexRelationId, INDEX_MARK_READY);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -750,13 +706,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -785,105 +735,355 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
+ * transactions that might have older snapshots.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Index can now be marked valid -- update its pg_index entry
+ */
+ index_concurrent_mark(indexRelationId, INDEX_MARK_VALID);
+
+ /*
+ * The pg_index update will cause backends (including this one) to update
+ * relcache entries for the index itself, but we should also send a
+ * relcache inval on the parent table to force replanning of cached plans.
+ * Otherwise existing sessions might fail to use the new index where it
+ * would be useful. (Note that our earlier commits did not create reasons
+ * to replan; relcache flush on the index itself was sufficient.)
+ */
+ CacheInvalidateRelcacheByRelid(heaprelid.relId);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table.
+ */
+ UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+
+ return indexRelationId;
+}
+
+
+/*
+ * ReindexConcurrentIndexes
+ *
+ * Process REINDEX CONCURRENTLY for given list of indexes.
+ * Each reindexing step is done simultaneously for all the given
+ * indexes. If no list of indexes is given by the caller, all the
+ * indexes included in the relation will be reindexed.
+ */
+bool
+ReindexConcurrentIndexes(Oid heapOid, List *indexIds)
+{
+ Relation heapRelation;
+ List *concurrentIndexIds = NIL,
+ *indexLocks = NIL,
+ *realIndexIds = NIL;
+ ListCell *lc, *lc2;
+ LockRelId heapLockId;
+ LOCKTAG heapLocktag;
+ Snapshot snapshot;
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
*
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
- for (i = 0; i < n_old_snapshots; i++)
+ /*
+ * Lock level used here should match index lock index_concurrent_create(),
+ * this prevents schema changes on the relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /*
+ * Get the list of indexes from relation if caller has not given anything
+ * Invalid indexes cannot be reindexed concurrently. Such indexes are simply
+ * bypassed if caller has not specified anything.
+ */
+ if (indexIds == NIL)
{
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
+ ListCell *cell;
+ foreach(cell, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
- if (i > 0)
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\"",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ realIndexIds = lappend_oid(realIndexIds, cellOid);
+ }
+
+ /* Add also the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
{
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid, ShareUpdateExclusiveLock);
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
+ foreach(cell, RelationGetIndexList(toastRelation))
{
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\"",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ realIndexIds = lappend_oid(realIndexIds, cellOid);
}
- pfree(newer_snapshots);
+
+ heap_close(toastRelation, ShareUpdateExclusiveLock);
}
+ }
+ else
+ {
+ ListCell *cell;
+ List *filteredList = NIL;
+ foreach(cell, indexIds)
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
+ /* Invalid indexes are not reindexed */
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ filteredList = lappend_oid(filteredList, cellOid);
+
+ /* Close relation */
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ }
+ realIndexIds = filteredList;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (realIndexIds == NIL)
+ {
+ heap_close(heapRelation, NoLock);
+ return false;
+ }
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, realIndexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /* Now open the relation of concurrent index, a lock is also needed on it */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop
+ * then close relations.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
}
/*
- * Index can now be marked valid -- update its pg_index entry
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
*/
- pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+ heapLockId = heapRelation->rd_lockInfo.lockRelId;
+ SET_LOCKTAG_RELATION(heapLocktag, heapLockId.dbId, heapLockId.relId);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
- indexTuple = SearchSysCacheCopy1(INDEXRELID,
- ObjectIdGetDatum(indexRelationId));
- if (!HeapTupleIsValid(indexTuple))
- elog(ERROR, "cache lookup failed for index %u", indexRelationId);
- indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ LockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
- Assert(indexForm->indisready);
- Assert(!indexForm->indisvalid);
+ /* Lock each index and each concurrent index accordingly */
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
- indexForm->indisvalid = true;
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+ StartTransactionCommand();
- simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
- CatalogUpdateIndexes(pg_index, indexTuple);
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * We need to wait until no running transactions could have the table open with
+ * the old list of indexes. A concurrent build is done for each concurrent
+ * index that will replace the old indexes. All those indexes share the same
+ * snapshot and they are built in the same transaction.
+ */
+ WaitForVirtualLocks(heapLocktag);
- heap_close(pg_index, RowExclusiveLock);
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, realIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the table
+ * must insert new entries into the index for insertions and non-HOT updates.
+ */
+ index_concurrent_mark(concurrentOid, INDEX_MARK_READY);
+ }
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
/*
- * The pg_index update will cause backends (including this one) to update
- * relcache entries for the index itself, but we should also send a
- * relcache inval on the parent table to force replanning of cached plans.
- * Otherwise existing sessions might fail to use the new index where it
- * would be useful. (Note that our earlier commits did not create reasons
- * to replan; relcache flush on the index itself was sufficient.)
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
*/
- CacheInvalidateRelcacheByRelid(heaprelid.relId);
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates.
+ */
+ WaitForVirtualLocks(heapLocktag);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ Relation indexRelation = index_open(indOid, ShareUpdateExclusiveLock);
+ relOid = indexRelation->rd_index->indrelid;
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, lfirst_oid(lc), snapshot);
+ }
+
+ /*
+ * Concurrent indexes can now be marked valid -- update pg_index entries
+ */
+ foreach(lc, concurrentIndexIds)
+ index_concurrent_mark(lfirst_oid(lc), INDEX_MARK_VALID);
+
+ /*
+ * The concurrent indexes are now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same whing
+ */
+ CacheInvalidateRelcacheByRelid(heapLockId.relId);
/* we can now do away with our active snapshot */
PopActiveSnapshot();
@@ -891,12 +1091,107 @@ DefineIndex(IndexStmt *stmt,
/* And we can remove the validating snapshot too */
UnregisterSnapshot(snapshot);
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
/*
- * Last thing to do is release the session-level lock on the parent table.
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once this transaction is committed.
*/
- UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
- return indexRelationId;
+ /* Take reference snapshot used to wait for older snapshots */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Wait for old snapshots, like previously */
+ WaitForOldSnapshots(snapshot);
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, realIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /* Mark the old index as invalid */
+ index_concurrent_mark(indOid, INDEX_MARK_NOT_VALID);
+ }
+
+ /* We can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * Commit this transaction had make old index invalidation visible.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them.
+ */
+ WaitForVirtualLocks(heapLocktag);
+
+ /* Get fresh snapshot for this step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, realIndexIds)
+ index_concurrent_mark(lfirst_oid(lc), INDEX_MARK_NOT_READY);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes.
+ */
+ index_concurrent_drop(realIndexIds);
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ UnlockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ return true;
}
@@ -1563,7 +1858,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1589,6 +1885,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1701,18 +2004,26 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
void
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
+
+ /* This is all for the non-concurrent case */
+ if (!concurrent)
+ {
+ reindex_index(indOid, false);
+ return;
+ }
- reindex_index(indOid, false);
+ /* Continue through REINDEX CONCURRENTLY */
+ ReindexConcurrentIndexes(heapOid, list_make1_oid(indOid));
}
/*
@@ -1774,18 +2085,139 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction can have the table open with the index marked as
+ * read-only for updates.
+ * To do this, inquire which xacts currently would conflict with ShareLock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the table. Then wait for each of these xacts to commit or abort.
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+static void
+WaitForVirtualLocks(LOCKTAG heaplocktag)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+static void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots;
+ int j;
+ int k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
void
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexConcurrentIndexes(heapOid, NIL))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1802,7 +2234,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
void
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1814,6 +2249,12 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /* CONCURRENTLY operation is not allowed for a database */
+ if (concurrent && do_system)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9387ee9..0685ae4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3601,6 +3601,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 226b99a..b95d22d 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1899,6 +1899,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e4ff76e..9964700 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6666,15 +6666,16 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type qualified_name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
n->relation = $3;
n->name = NULL;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6682,9 +6683,10 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = false;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6692,6 +6694,7 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = true;
+ n->concurrent = $5;
$$ = (Node *)n;
}
;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 97376bb..aaf2631 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1262,15 +1262,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1282,8 +1286,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index eb417ce..4b89f61 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -19,6 +19,15 @@
#define DEFAULT_INDEX_TYPE "btree"
+typedef enum IndexMarkOperation
+{
+ INDEX_MARK_VALID,
+ INDEX_MARK_NOT_VALID,
+ INDEX_MARK_READY,
+ INDEX_MARK_NOT_READY
+} IndexMarkOperation;
+
+
/* Typedef for callback function for IndexBuildHeapScan */
typedef void (*IndexBuildCallback) (Relation index,
HeapTuple htup,
@@ -50,7 +59,22 @@ extern Oid index_create(Relation heapRelation,
bool initdeferred,
bool allow_system_table_mods,
bool skip_build,
- bool concurrent);
+ bool concurrent,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_mark(Oid indOid, IndexMarkOperation operation);
+
+extern void index_concurrent_swap(Oid indexOid1, Oid indexOid2);
+
+extern void index_concurrent_drop(List *IndexIds);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 2c81b78..43dfa15 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation);
-extern void ReindexTable(RangeVar *relation);
+extern void ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern void ReindexTable(RangeVar *relation, bool concurrent);
extern void ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexConcurrentIndexes(Oid heapOid, List *indexIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 09b15e7..4d82033 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2510,6 +2510,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..6ffa488 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,40 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int, c2 text);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1,'a');
+INSERT INTO concur_reindex_tab VALUES (2,'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY;-- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+ERROR: cannot reindex system concurrently
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer |
+ c2 | text |
+Indexes:
+ "concur_reindex_tab1" btree (c1)
+ "concur_reindex_tab2" btree (c2)
+
+DROP TABLE concur_reindex_tab;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..7b3b036 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,30 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int, c2 text);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1,'a');
+INSERT INTO concur_reindex_tab VALUES (2,'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY;-- no shared relation
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab;
Hi all,
Long time this thread has not been updated...
Please find attached the version 3 of the patch for support of REINDEX
CONCURRENTLY.
The code has been realigned with master up to commit da07a1e (6th December).
Here are the things modified:
- Improve code to use index_set_state_flag introduced by Tom in commit
3c84046
- One transaction is used for each index swap (N transactions if N indexes
reindexed at the same time)
- Fixed a bug to drop the old indexes concurrently at the end of process
The index swap is managed by switching the names of the new and old indexes
using RenameRelationInternal several times. This API takes an exclusive
lock on the relation that is renamed until the end of the transaction
managing the swap. This has been discussed in this thread and other
threads, but it is important to mention it for people who have not read the
patch.
There are still two things that are missing in this patch, but I would like
to have more feedback before moving forward:
- REINDEX CONCURRENTLY needs tests in src/test/isolation
- There is still a problem with toast indexes. If the concurrent reindex of
a toast index fails for a reason or another, pg_relation will finish with
invalid toast index entries. I am still wondering about how to clean up
that. Any ideas?
Comments?
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20121207_reindex_concurrently_v3.patchapplication/octet-stream; name=20121207_reindex_concurrently_v3.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..2931329 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ] [ CONCURRENTLY ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,93 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ </refsect2>
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 66012ac..90deb5c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -671,6 +671,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -694,7 +698,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -730,26 +735,31 @@ index_create(Relation heapRelation,
if (!allow_system_table_mods &&
IsSystemRelation(heapRelation) &&
- IsNormalProcessingMode())
+ IsNormalProcessingMode() &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("user-defined indexes on system catalog tables are not supported")));
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1095,6 +1105,243 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char nameNew[NAMEDATALEN],
+ nameOld[NAMEDATALEN],
+ nameTemp[NAMEDATALEN];
+
+ /* The new index is going to use the name of the old index */
+ snprintf(nameNew, NAMEDATALEN, "%s", get_rel_name(newIndexOid));
+ snprintf(nameOld, NAMEDATALEN, "%s", get_rel_name(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ snprintf(nameTemp, NAMEDATALEN, "cct_%d", oldIndexOid);
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+}
+
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a list of indexes in a concurrent process. Deletion has to be done
+ * through performDeletion or dependencies of the index are not dropped.
+ */
+void
+index_concurrent_drop(List *indexIds)
+{
+ ListCell *lc;
+ ObjectAddresses *objects = new_object_addresses();
+
+ Assert(indexIds != NIL);
+
+ /* Scan the list of indexes and build object list for normal indexes */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+
+ /* Register constraint or index for drop */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Add object to list */
+ add_exact_object_address(&object, objects);
+ }
+
+ /* Perform deletion for normal and toast indexes */
+ performMultipleDeletions(objects,
+ PERFORM_DELETION_CONCURRENTLY,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1325,7 +1572,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1464,13 +1710,7 @@ index_drop(Oid indexId, bool concurrent)
* not check for that. Also, prepared xacts are not reported, which
* is fine since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* No more predicate locks will be acquired on this index, and we're
@@ -1514,13 +1754,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 2979819..5181dbc 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, true);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75f9ff1..2bcf5b5 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,447 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexConcurrentIndexes
+ *
+ * Process REINDEX CONCURRENTLY for given list of indexes.
+ * Each reindexing step is done simultaneously for all the given
+ * indexes. If no list of indexes is given by the caller, all the
+ * indexes included in the relation will be reindexed.
+ */
+bool
+ReindexConcurrentIndexes(Oid heapOid, List *indexIds)
+{
+ Relation heapRelation;
+ List *concurrentIndexIds = NIL,
+ *indexLocks = NIL,
+ *realIndexIds = NIL;
+ ListCell *lc, *lc2;
+ LockRelId heapLockId;
+ LOCKTAG heapLocktag;
+ Snapshot snapshot;
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /*
+ * Lock level used here should match index lock index_concurrent_create(),
+ * this prevents schema changes on the relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /*
+ * Get the list of indexes from relation if caller has not given anything
+ * Invalid indexes cannot be reindexed concurrently. Such indexes are simply
+ * bypassed if caller has not specified anything.
+ */
+ if (indexIds == NIL)
+ {
+ ListCell *cell;
+ foreach(cell, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\"",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ realIndexIds = lappend_oid(realIndexIds, cellOid);
+ }
+
+ /* Add also the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid, ShareUpdateExclusiveLock);
+
+ foreach(cell, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\"",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ realIndexIds = lappend_oid(realIndexIds, cellOid);
+ }
+
+ heap_close(toastRelation, ShareUpdateExclusiveLock);
+ }
+ }
+ else
+ {
+ ListCell *cell;
+ List *filteredList = NIL;
+ foreach(cell, indexIds)
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
+
+ /* Invalid indexes are not reindexed */
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ filteredList = lappend_oid(filteredList, cellOid);
+
+ /* Close relation */
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ }
+ realIndexIds = filteredList;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (realIndexIds == NIL)
+ {
+ heap_close(heapRelation, NoLock);
+ return false;
+ }
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, realIndexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /* Now open the relation of concurrent index, a lock is also needed on it */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop
+ * then close relations.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ heapLockId = heapRelation->rd_lockInfo.lockRelId;
+ SET_LOCKTAG_RELATION(heapLocktag, heapLockId.dbId, heapLockId.relId);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ LockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
+
+ /* Lock each index and each concurrent index accordingly */
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * We need to wait until no running transactions could have the table open with
+ * the old list of indexes. A concurrent build is done for each concurrent
+ * index that will replace the old indexes. All those indexes share the same
+ * snapshot and they are built in the same transaction.
+ */
+ WaitForVirtualLocks(heapLocktag, ShareLock);
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, realIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the table
+ * must insert new entries into the index for insertions and non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+ }
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates.
+ */
+ WaitForVirtualLocks(heapLocktag, ShareLock);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ Relation indexRelation = index_open(indOid, ShareUpdateExclusiveLock);
+ relOid = indexRelation->rd_index->indrelid;
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, lfirst_oid(lc), snapshot);
+ }
+
+ /*
+ * Concurrent indexes can now be marked valid -- update pg_index entries
+ */
+ foreach(lc, concurrentIndexIds)
+ index_set_state_flags(lfirst_oid(lc), INDEX_CREATE_SET_VALID);
+
+ /*
+ * The concurrent indexes are now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same whing
+ */
+ CacheInvalidateRelcacheByRelid(heapLockId.relId);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, realIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the old index as invalid, this needs to be done as the first
+ * action in this transaction.
+ */
+ index_set_state_flags(indOid, INDEX_DROP_CLEAR_VALID);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /* Continue process inside a new transaction block */
+ StartTransactionCommand();
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them.
+ */
+ WaitForVirtualLocks(heapLocktag, ShareLock);
+
+ /* Get fresh snapshot for this step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, realIndexIds)
+ index_set_state_flags(lfirst_oid(lc), INDEX_DROP_SET_DEAD);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes.
+ */
+ index_concurrent_drop(realIndexIds);
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ UnlockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1877,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1904,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2023,26 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
void
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* This is all for the non-concurrent case */
+ if (!concurrent)
+ {
+ reindex_index(indOid, false);
+ return;
+ }
+
+ /* Continue through REINDEX CONCURRENTLY */
+ ReindexConcurrentIndexes(heapOid, list_make1_oid(indOid));
}
/*
@@ -1745,18 +2104,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
void
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexConcurrentIndexes(heapOid, NIL))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1773,7 +2144,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
void
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1785,6 +2159,12 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /* CONCURRENTLY operation is not allowed for a database */
+ if (concurrent && do_system)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9387ee9..0685ae4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3601,6 +3601,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 95a95f4..cdea86a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1840,6 +1840,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ad98b36..db3a5f8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6670,15 +6670,16 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type qualified_name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
n->relation = $3;
n->name = NULL;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6686,9 +6687,10 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = false;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6696,6 +6698,7 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = true;
+ n->concurrent = $5;
$$ = (Node *)n;
}
;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 94f58a9..40dedde 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,114 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction can have the table open with the index marked as
+ * read-only for updates.
+ * To do this, inquire which xacts currently would conflict with ShareLock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the table. Then wait for each of these xacts to commit or abort.
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a42b8e9..9424140 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1255,15 +1255,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1275,8 +1279,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index b96099f..539fc0a 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,20 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid indexOid1, Oid indexOid2);
+
+extern void index_concurrent_drop(List *IndexIds);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 2c81b78..43dfa15 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation);
-extern void ReindexTable(RangeVar *relation);
+extern void ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern void ReindexTable(RangeVar *relation, bool concurrent);
extern void ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexConcurrentIndexes(Oid heapOid, List *indexIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 8834499..46bc532 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2511,6 +2511,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 9933dad..2e2d9dc 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..26bd952 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,43 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+ERROR: cannot reindex system concurrently
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer |
+ c2 | text |
+Indexes:
+ "concur_reindex_tab1" btree (c1)
+ "concur_reindex_tab2" btree (c2)
+
+DROP TABLE concur_reindex_tab;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..be9c5cc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,31 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY; -- no shared relation
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab;
On 7 December 2012 12:37, Michael Paquier <michael.paquier@gmail.com> wrote:
There are still two things that are missing in this patch, but I would like
to have more feedback before moving forward:
- REINDEX CONCURRENTLY needs tests in src/test/isolation
Yes, it needs those
- There is still a problem with toast indexes. If the concurrent reindex of
a toast index fails for a reason or another, pg_relation will finish with
invalid toast index entries. I am still wondering about how to clean up
that. Any ideas?
Build another toast index, rather than reindexing the existing one,
then just use the new oid.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-07 21:37:06 +0900, Michael Paquier wrote:
Hi all,
Long time this thread has not been updated...
Please find attached the version 3 of the patch for support of REINDEX
CONCURRENTLY.
The code has been realigned with master up to commit da07a1e (6th December).Here are the things modified:
- Improve code to use index_set_state_flag introduced by Tom in commit
3c84046
- One transaction is used for each index swap (N transactions if N indexes
reindexed at the same time)
- Fixed a bug to drop the old indexes concurrently at the end of processThe index swap is managed by switching the names of the new and old indexes
using RenameRelationInternal several times. This API takes an exclusive
lock on the relation that is renamed until the end of the transaction
managing the swap. This has been discussed in this thread and other
threads, but it is important to mention it for people who have not read the
patch.
Won't working like this cause problems when dependencies towards that
index exist? E.g. an index-based constraint?
As you have an access exlusive lock you should be able to just switch
the relfilenodes of both and concurrently drop the *_cci index with the
old relfilenode afterwards, that would preserve the index states.
Right now I think clearing checkxmin is all you would need to other than
that. We know we don't need it in the concurrent context.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> writes:
On 7 December 2012 12:37, Michael Paquier <michael.paquier@gmail.com> wrote:
- There is still a problem with toast indexes. If the concurrent reindex of
a toast index fails for a reason or another, pg_relation will finish with
invalid toast index entries. I am still wondering about how to clean up
that. Any ideas?
Build another toast index, rather than reindexing the existing one,
then just use the new oid.
Um, I don't think you can swap in a new toast index OID without taking
exclusive lock on the parent table at some point.
One sticking point is the need to update pg_class.reltoastidxid. I
wonder how badly we need that field though --- could we get rid of it
and treat toast-table indexes just the same as normal ones? (Whatever
code is looking at the field could perhaps instead rely on
RelationGetIndexList.)
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On 7 December 2012 12:37, Michael Paquier <michael.paquier@gmail.com> wrote:
- There is still a problem with toast indexes. If the concurrent reindex of
a toast index fails for a reason or another, pg_relation will finish with
invalid toast index entries. I am still wondering about how to clean up
that. Any ideas?Build another toast index, rather than reindexing the existing one,
then just use the new oid.
Thats easier said than done in the first place. toast_save_datum()
explicitly opens/modifies the one index it needs and updates it.
Um, I don't think you can swap in a new toast index OID without taking
exclusive lock on the parent table at some point.
The whole swapping issue isn't solved satisfyingly as whole yet :(.
If we just swap the index relfilenodes in the pg_index entries itself,
we wouldn't need to modify the main table's pg_class at all.
One sticking point is the need to update pg_class.reltoastidxid. I
wonder how badly we need that field though --- could we get rid of it
and treat toast-table indexes just the same as normal ones? (Whatever
code is looking at the field could perhaps instead rely on
RelationGetIndexList.)
We could probably just set Relation->rd_toastidx when building the
relcache entry for the toast table so it doesn't have to search the
whole indexlist all the time. Not that that would be too big, but...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 7 December 2012 17:19, Andres Freund <andres@2ndquadrant.com> wrote:
On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On 7 December 2012 12:37, Michael Paquier <michael.paquier@gmail.com> wrote:
- There is still a problem with toast indexes. If the concurrent reindex of
a toast index fails for a reason or another, pg_relation will finish with
invalid toast index entries. I am still wondering about how to clean up
that. Any ideas?Build another toast index, rather than reindexing the existing one,
then just use the new oid.Thats easier said than done in the first place. toast_save_datum()
explicitly opens/modifies the one index it needs and updates it.
Well, yeh, I know what I'm saying: it would need to maintain 2 indexes
for a while.
The point is to use the same trick we do manually now, which works
fine for normal indexes and can be made to work for toast indexes
also.
Um, I don't think you can swap in a new toast index OID without taking
exclusive lock on the parent table at some point.The whole swapping issue isn't solved satisfyingly as whole yet :(.
If we just swap the index relfilenodes in the pg_index entries itself,
we wouldn't need to modify the main table's pg_class at all.
yes
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Dec 7, 2012 at 10:33 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 7 December 2012 12:37, Michael Paquier <michael.paquier@gmail.com>
wrote:- There is still a problem with toast indexes. If the concurrent reindex
of
a toast index fails for a reason or another, pg_relation will finish with
invalid toast index entries. I am still wondering about how to clean up
that. Any ideas?Build another toast index, rather than reindexing the existing one,
then just use the new oid.
Hum? The patch already does that. It creates concurrently a new index which
is a duplicate of the existing one, then the old and new indexes are
swapped. Finally the old index is dropped concurrently.
The problem I still see is the following one:
If a toast index, or a relation having a toast index, is being reindexed
concurrently, and that the server crashes during the process, there will be
invalid toast indexes in the server. If the crash happens before the swap,
the new toast index is invalid. If the crash happens after the swap, the
old toast index is invalid.
I am not sure the user is able to clean up such invalid toast indexes
manually as they are not visible to him.
--
Michael Paquier
http://michael.otacoo.com
On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On 7 December 2012 12:37, Michael Paquier <michael.paquier@gmail.com>
wrote:
- There is still a problem with toast indexes. If the concurrent
reindex of
a toast index fails for a reason or another, pg_relation will finish
with
invalid toast index entries. I am still wondering about how to clean
up
that. Any ideas?
Build another toast index, rather than reindexing the existing one,
then just use the new oid.Thats easier said than done in the first place. toast_save_datum()
explicitly opens/modifies the one index it needs and updates it.Um, I don't think you can swap in a new toast index OID without taking
exclusive lock on the parent table at some point.The whole swapping issue isn't solved satisfyingly as whole yet :(.
If we just swap the index relfilenodes in the pg_index entries itself,
we wouldn't need to modify the main table's pg_class at all.
I think you are mistaking here, relfilenode is a column of pg_class and not
pg_index.
So whatever the method used for swapping: relfilenode switch or relname
switch, you need to modify the pg_class entry of the old and new indexes.
--
Michael Paquier
http://michael.otacoo.com
On Sat, Dec 8, 2012 at 2:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Um, I don't think you can swap in a new toast index OID without taking
exclusive lock on the parent table at some point.One sticking point is the need to update pg_class.reltoastidxid. I
wonder how badly we need that field though --- could we get rid of it
and treat toast-table indexes just the same as normal ones? (Whatever
code is looking at the field could perhaps instead rely on
RelationGetIndexList.)
Yes. reltoastidxid refers to the index of the toast table so it is
necessary to take a lock on the parent relation in this case. I haven't
thought of that. I also do not really know how far this is used by the
toast process, but just by thinking safety taking a lock on the parent
relation would be better.
For a normal index, locking the parent table is not necessary as we do not
need to modify anything in the parent relation entry in pg_class.
--
Michael Paquier
http://michael.otacoo.com
On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
On Sat, Dec 8, 2012 at 2:19 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2012-12-07 12:01:52 -0500, Tom Lane wrote:
Simon Riggs <simon@2ndQuadrant.com> writes:
On 7 December 2012 12:37, Michael Paquier <michael.paquier@gmail.com>
wrote:
- There is still a problem with toast indexes. If the concurrent
reindex of
a toast index fails for a reason or another, pg_relation will finish
with
invalid toast index entries. I am still wondering about how to clean
up
that. Any ideas?
Build another toast index, rather than reindexing the existing one,
then just use the new oid.Thats easier said than done in the first place. toast_save_datum()
explicitly opens/modifies the one index it needs and updates it.Um, I don't think you can swap in a new toast index OID without taking
exclusive lock on the parent table at some point.The whole swapping issue isn't solved satisfyingly as whole yet :(.
If we just swap the index relfilenodes in the pg_index entries itself,
we wouldn't need to modify the main table's pg_class at all.I think you are mistaking here, relfilenode is a column of pg_class and not
pg_index.
So whatever the method used for swapping: relfilenode switch or relname
switch, you need to modify the pg_class entry of the old and new indexes.
The point is that with a relname switch the pg_class.oid of the index
changes. Which is a bad idea because it will possibly be referred to by
pg_depend entries. Relfilenodes - which certainly live in pg_class too,
thats not the point - aren't referred to externally though. So if
everything else in pg_class/pg_index stays the same a relfilenode switch
imo saves you a lot of trouble.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
So whatever the method used for swapping: relfilenode switch or relname
switch, you need to modify the pg_class entry of the old and new indexes.
The point is that with a relname switch the pg_class.oid of the index
changes. Which is a bad idea because it will possibly be referred to by
pg_depend entries. Relfilenodes - which certainly live in pg_class too,
thats not the point - aren't referred to externally though. So if
everything else in pg_class/pg_index stays the same a relfilenode switch
imo saves you a lot of trouble.
I do not believe that it is safe to modify an index's relfilenode *nor*
its OID without exclusive lock; both of those are going to be in use to
identify and access the index in concurrent sessions. The only things
we could possibly safely swap in a REINDEX CONCURRENTLY are the index
relnames, which are not used for identification by the system itself.
(I think. It's possible that even this breaks something.)
Even then, any such update of the pg_class rows is dependent on
switching to MVCC-style catalog access, which frankly is pie in the sky
at the moment; the last time pgsql-hackers talked seriously about that,
there seemed to be multiple hard problems besides mere performance.
If you want to wait for that, it's a safe bet that we won't see this
feature for a few years.
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly. Or just generate a new name using the
same rules that CREATE INDEX would when no name is specified. Yeah,
it's a hack, but what about the CONCURRENTLY commands isn't a hack?
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
On 2012-12-08 21:24:47 +0900, Michael Paquier wrote:
So whatever the method used for swapping: relfilenode switch or relname
switch, you need to modify the pg_class entry of the old and new indexes.The point is that with a relname switch the pg_class.oid of the index
changes. Which is a bad idea because it will possibly be referred to by
pg_depend entries. Relfilenodes - which certainly live in pg_class too,
thats not the point - aren't referred to externally though. So if
everything else in pg_class/pg_index stays the same a relfilenode switch
imo saves you a lot of trouble.I do not believe that it is safe to modify an index's relfilenode *nor*
its OID without exclusive lock; both of those are going to be in use to
identify and access the index in concurrent sessions. The only things
we could possibly safely swap in a REINDEX CONCURRENTLY are the index
relnames, which are not used for identification by the system itself.
(I think. It's possible that even this breaks something.)
Well, the patch currently *does* take an exlusive lock in an extra
transaction just for the swapping. In that case it should actually be
safe.
Although that obviously removes part of the usefulness of the feature.
Even then, any such update of the pg_class rows is dependent on
switching to MVCC-style catalog access, which frankly is pie in the sky
at the moment; the last time pgsql-hackers talked seriously about that,
there seemed to be multiple hard problems besides mere performance.
If you want to wait for that, it's a safe bet that we won't see this
feature for a few years.
Yea :(
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly. Or just generate a new name using the
same rules that CREATE INDEX would when no name is specified. Yeah,
it's a hack, but what about the CONCURRENTLY commands isn't a hack?
I have no problem with ending up with a new name or something like
that. If that is what it takes: fine, no problem.
The issue I raised above is just about keeping the pg_depend entries
pointing to something valid... And not changing the indexes pg_class.oid
seems to be the easiest solution for that.
I have some vague schemes in my had that we can solve the swapping issue
with 3 entries for the index in pg_class, but they all only seem to come
to my head while I don't have anything to write them down, so they are
probably bogus.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
The issue I raised above is just about keeping the pg_depend entries
pointing to something valid... And not changing the indexes pg_class.oid
seems to be the easiest solution for that.
Yeah, we would have to update pg_depend, pg_constraint, maybe some other
places if we go with that. I think that would be safe because we'd be
holding ShareRowExclusive lock on the parent table throughout, so nobody
else should be doing anything that's critically dependent on seeing such
rows. But it'd be a lot of ugly code, for sure.
Maybe the best way is to admit that we need a short-term exclusive lock
for the swapping step. Or we could wait for MVCC catalog access ...
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 8 December 2012 15:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Maybe the best way is to admit that we need a short-term exclusive lock
for the swapping step.
Which wouldn't be so bad if this is just for the toast index, since in
many cases the index itself is completely empty anyway, which must
offer opportunities for optimization.
Or we could wait for MVCC catalog access ...
If there was a published design for that, it would help believe in it more.
Do you think one exists?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> writes:
On 8 December 2012 15:14, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Or we could wait for MVCC catalog access ...
If there was a published design for that, it would help believe in it more.
Do you think one exists?
Well, there have been discussion threads about it in the past. I don't
recall whether any insoluble issues were raised. I think the concerns
were mostly about performance, if we start taking many more snapshots
than we have in the past.
The basic idea isn't hard: anytime a catalog scan is requested with
SnapshotNow, replace that with a freshly taken MVCC snapshot. I think
we'd agreed that this could safely be optimized to "only take a new
snapshot if any new heavyweight lock has been acquired since the last
one". But that'll still be a lot of snapshots, and we know the
snapshot-getting code is a bottleneck already. I think the discussions
mostly veered off at this point into how to make snapshots cheaper.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
I have updated the patch (v4) to take care of updating reltoastidxid for
toast parent relations at the swap step by using index_update_stats. In
prior versions of the patch this was done when concurrent index was built,
leading to toast relations using invalid indexes if there was a failure
before the swap phase. The update of reltoastidxids of toast relation is
done with RowExclusiveLock.
I also added a couple of tests in src/test/isolation. Btw, as for the time
being the swap step uses AccessExclusiveLock to switch old and new
relnames, it does not have any meaning to run them...
On Sat, Dec 8, 2012 at 11:55 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly. Or just generate a new name using the
same rules that CREATE INDEX would when no name is specified. Yeah,
it's a hack, but what about the CONCURRENTLY commands isn't a hack?I have no problem with ending up with a new name or something like
that. If that is what it takes: fine, no problem.
For the indexes that are created internally by the system like toast or
internal primary keys this is acceptable. However in the case of indexes
that have been created externally I do not think it is acceptable as this
impacts the user that created those indexes with a specific name.
pg_reorg itself also uses the relname switch method when rebuilding indexes
and people using it did not complain about the heavy lock taken at swap
phase, but praised it as it really helps in reducing the lock taken for
reindex at index rebuild and validation, which are the phases that take the
largest amount of time in the REINDEX process btw.
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20121210_reindex_concurrently_v4.patchapplication/octet-stream; name=20121210_reindex_concurrently_v4.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..2931329 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ] [ CONCURRENTLY ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,93 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ </refsect2>
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 11086e2..2975bd7 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1141,7 +1141,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index d93d273..740e9f0 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2642,7 +2642,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 66012ac..4220398 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -671,6 +671,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -694,7 +698,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -730,26 +735,31 @@ index_create(Relation heapRelation,
if (!allow_system_table_mods &&
IsSystemRelation(heapRelation) &&
- IsNormalProcessingMode())
+ IsNormalProcessingMode() &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("user-defined indexes on system catalog tables are not supported")));
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,270 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char nameNew[NAMEDATALEN],
+ nameOld[NAMEDATALEN],
+ nameTemp[NAMEDATALEN];
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+
+ /* The new index is going to use the name of the old index */
+ snprintf(nameNew, NAMEDATALEN, "%s", get_rel_name(newIndexOid));
+ snprintf(nameOld, NAMEDATALEN, "%s", get_rel_name(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ snprintf(nameTemp, NAMEDATALEN, "cct_%d", oldIndexOid);
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ Relation pg_class;
+
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ pg_class = heap_open(parentOid, RowExclusiveLock);
+
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(pg_class, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(pg_class, RowExclusiveLock);
+ }
+}
+
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a list of indexes in a concurrent process. Deletion has to be done
+ * through performDeletion or dependencies of the index are not dropped.
+ */
+void
+index_concurrent_drop(List *indexIds)
+{
+ ListCell *lc;
+ ObjectAddresses *objects = new_object_addresses();
+
+ Assert(indexIds != NIL);
+
+ /* Scan the list of indexes and build object list for normal indexes */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+
+ /* Register constraint or index for drop */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Add object to list */
+ add_exact_object_address(&object, objects);
+ }
+
+ /* Perform deletion for normal and toast indexes */
+ performMultipleDeletions(objects,
+ PERFORM_DELETION_CONCURRENTLY,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1325,7 +1599,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1464,13 +1737,7 @@ index_drop(Oid indexId, bool concurrent)
* not check for that. Also, prepared xacts are not reported, which
* is fine since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* No more predicate locks will be acquired on this index, and we're
@@ -1514,13 +1781,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1942,6 +2203,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1955,7 +2218,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2070,7 +2334,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3188,7 +3453,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 2979819..5181dbc 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, true);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75f9ff1..2bcf5b5 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,447 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexConcurrentIndexes
+ *
+ * Process REINDEX CONCURRENTLY for given list of indexes.
+ * Each reindexing step is done simultaneously for all the given
+ * indexes. If no list of indexes is given by the caller, all the
+ * indexes included in the relation will be reindexed.
+ */
+bool
+ReindexConcurrentIndexes(Oid heapOid, List *indexIds)
+{
+ Relation heapRelation;
+ List *concurrentIndexIds = NIL,
+ *indexLocks = NIL,
+ *realIndexIds = NIL;
+ ListCell *lc, *lc2;
+ LockRelId heapLockId;
+ LOCKTAG heapLocktag;
+ Snapshot snapshot;
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /*
+ * Lock level used here should match index lock index_concurrent_create(),
+ * this prevents schema changes on the relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /*
+ * Get the list of indexes from relation if caller has not given anything
+ * Invalid indexes cannot be reindexed concurrently. Such indexes are simply
+ * bypassed if caller has not specified anything.
+ */
+ if (indexIds == NIL)
+ {
+ ListCell *cell;
+ foreach(cell, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\"",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ realIndexIds = lappend_oid(realIndexIds, cellOid);
+ }
+
+ /* Add also the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid, ShareUpdateExclusiveLock);
+
+ foreach(cell, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\"",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ realIndexIds = lappend_oid(realIndexIds, cellOid);
+ }
+
+ heap_close(toastRelation, ShareUpdateExclusiveLock);
+ }
+ }
+ else
+ {
+ ListCell *cell;
+ List *filteredList = NIL;
+ foreach(cell, indexIds)
+ {
+ Oid cellOid = lfirst_oid(cell);
+ Relation indexRelation = index_open(cellOid, ShareUpdateExclusiveLock);
+
+ /* Invalid indexes are not reindexed */
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ filteredList = lappend_oid(filteredList, cellOid);
+
+ /* Close relation */
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ }
+ realIndexIds = filteredList;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (realIndexIds == NIL)
+ {
+ heap_close(heapRelation, NoLock);
+ return false;
+ }
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, realIndexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /* Now open the relation of concurrent index, a lock is also needed on it */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop
+ * then close relations.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ indexLocks = lappend(indexLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ heapLockId = heapRelation->rd_lockInfo.lockRelId;
+ SET_LOCKTAG_RELATION(heapLocktag, heapLockId.dbId, heapLockId.relId);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ LockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
+
+ /* Lock each index and each concurrent index accordingly */
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * We need to wait until no running transactions could have the table open with
+ * the old list of indexes. A concurrent build is done for each concurrent
+ * index that will replace the old indexes. All those indexes share the same
+ * snapshot and they are built in the same transaction.
+ */
+ WaitForVirtualLocks(heapLocktag, ShareLock);
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, realIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the table
+ * must insert new entries into the index for insertions and non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+ }
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates.
+ */
+ WaitForVirtualLocks(heapLocktag, ShareLock);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ Relation indexRelation = index_open(indOid, ShareUpdateExclusiveLock);
+ relOid = indexRelation->rd_index->indrelid;
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, lfirst_oid(lc), snapshot);
+ }
+
+ /*
+ * Concurrent indexes can now be marked valid -- update pg_index entries
+ */
+ foreach(lc, concurrentIndexIds)
+ index_set_state_flags(lfirst_oid(lc), INDEX_CREATE_SET_VALID);
+
+ /*
+ * The concurrent indexes are now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same whing
+ */
+ CacheInvalidateRelcacheByRelid(heapLockId.relId);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, realIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the old index as invalid, this needs to be done as the first
+ * action in this transaction.
+ */
+ index_set_state_flags(indOid, INDEX_DROP_CLEAR_VALID);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /* Continue process inside a new transaction block */
+ StartTransactionCommand();
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them.
+ */
+ WaitForVirtualLocks(heapLocktag, ShareLock);
+
+ /* Get fresh snapshot for this step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, realIndexIds)
+ index_set_state_flags(lfirst_oid(lc), INDEX_DROP_SET_DEAD);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible.
+ */
+ CommitTransactionCommand();
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes.
+ */
+ index_concurrent_drop(realIndexIds);
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ UnlockRelationIdForSession(&heapLockId, ShareUpdateExclusiveLock);
+ foreach(lc, indexLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1877,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1904,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2023,26 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
void
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* This is all for the non-concurrent case */
+ if (!concurrent)
+ {
+ reindex_index(indOid, false);
+ return;
+ }
+
+ /* Continue through REINDEX CONCURRENTLY */
+ ReindexConcurrentIndexes(heapOid, list_make1_oid(indOid));
}
/*
@@ -1745,18 +2104,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
void
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexConcurrentIndexes(heapOid, NIL))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1773,7 +2144,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
void
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1785,6 +2159,12 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /* CONCURRENTLY operation is not allowed for a database */
+ if (concurrent && do_system)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9387ee9..0685ae4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3601,6 +3601,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 95a95f4..cdea86a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1840,6 +1840,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ad98b36..db3a5f8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6670,15 +6670,16 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type qualified_name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
n->relation = $3;
n->name = NULL;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6686,9 +6687,10 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = false;
+ n->concurrent = $5;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE name opt_force opt_concurrently
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
@@ -6696,6 +6698,7 @@ ReindexStmt:
n->relation = NULL;
n->do_system = true;
n->do_user = true;
+ n->concurrent = $5;
$$ = (Node *)n;
}
;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 94f58a9..40dedde 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,114 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction can have the table open with the index marked as
+ * read-only for updates.
+ * To do this, inquire which xacts currently would conflict with ShareLock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the table. Then wait for each of these xacts to commit or abort.
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a42b8e9..9424140 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1255,15 +1255,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1275,8 +1279,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index b96099f..e15605c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,20 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid indexOid1, Oid indexOid2);
+
+extern void index_concurrent_drop(List *IndexIds);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +101,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 2c81b78..43dfa15 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation);
-extern void ReindexTable(RangeVar *relation);
+extern void ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern void ReindexTable(RangeVar *relation, bool concurrent);
extern void ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexConcurrentIndexes(Oid heapOid, List *indexIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 8834499..46bc532 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2511,6 +2511,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 9933dad..2e2d9dc 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..4053b53
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE reind_con_tab CONCURRENTLY; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..26bd952 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,43 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+ERROR: cannot reindex system concurrently
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer |
+ c2 | text |
+Indexes:
+ "concur_reindex_tab1" btree (c1)
+ "concur_reindex_tab2" btree (c2)
+
+DROP TABLE concur_reindex_tab;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..be9c5cc 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,31 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE concur_reindex_tab CONCURRENTLY; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE INDEX concur_reindex_tab1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_tab2 ON concur_reindex_tab(c2);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX concur_reindex_tab1 CONCURRENTLY;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE concur_reindex_tab CONCURRENTLY;
+COMMIT;
+REINDEX TABLE pg_database CONCURRENTLY; -- no shared relation
+REINDEX DATABASE postgres CONCURRENTLY; -- not allowed for DATABASE
+REINDEX SYSTEM postgres CONCURRENTLY; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab;
On 10 December 2012 06:03, Michael Paquier <michael.paquier@gmail.com> wrote:
On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly. Or just generate a new name using the
same rules that CREATE INDEX would when no name is specified. Yeah,
it's a hack, but what about the CONCURRENTLY commands isn't a hack?I have no problem with ending up with a new name or something like
that. If that is what it takes: fine, no problem.For the indexes that are created internally by the system like toast or
internal primary keys this is acceptable. However in the case of indexes
that have been created externally I do not think it is acceptable as this
impacts the user that created those indexes with a specific name.
If I have to choose between (1) keeping the same name OR (2) avoiding
an AccessExclusiveLock then I would choose (2). Most other people
would also, especially when all we would do is add/remove an
underscore. Even if that is user visible. And if it is we can support
a LOCK option that does (1) instead.
If we make it an additional constraint on naming, it won't be a
problem... namely that you can't create an index with/without an
underscore at the end, if a similar index already exists that has an
identical name apart from the suffix.
There are few, if any, commands that need the index name to remain the
same. For those, I think we can bend them to accept the index name and
then add/remove the underscore to get that to work.
That's all a little bit crappy, but this is too small a problem with
an important feature to allow us to skip.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Michael Paquier
http://michael.otacoo.com
On 2012/12/10, at 18:28, Simon Riggs <simon@2ndQuadrant.com> wrote:
On 10 December 2012 06:03, Michael Paquier <michael.paquier@gmail.com> wrote:
On 2012-12-08 09:40:43 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly. Or just generate a new name using the
same rules that CREATE INDEX would when no name is specified. Yeah,
it's a hack, but what about the CONCURRENTLY commands isn't a hack?I have no problem with ending up with a new name or something like
that. If that is what it takes: fine, no problem.For the indexes that are created internally by the system like toast or
internal primary keys this is acceptable. However in the case of indexes
that have been created externally I do not think it is acceptable as this
impacts the user that created those indexes with a specific name.If I have to choose between (1) keeping the same name OR (2) avoiding
an AccessExclusiveLock then I would choose (2). Most other people
would also, especially when all we would do is add/remove an
underscore. Even if that is user visible. And if it is we can support
a LOCK option that does (1) instead.If we make it an additional constraint on naming, it won't be a
problem... namely that you can't create an index with/without an
underscore at the end, if a similar index already exists that has an
identical name apart from the suffix.There are few, if any, commands that need the index name to remain the
same. For those, I think we can bend them to accept the index name and
then add/remove the underscore to get that to work.That's all a little bit crappy, but this is too small a problem with
an important feature to allow us to skip.
Ok. Removing the switch name part is only deleting 10 lines of code in index_concurrent_swap.
Then, do you guys have a preferred format for the concurrent index name? For the time being an inelegant _cct suffix is used. The underscore at the end?
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-10 15:03:59 +0900, Michael Paquier wrote:
I have updated the patch (v4) to take care of updating reltoastidxid for
toast parent relations at the swap step by using index_update_stats. In
prior versions of the patch this was done when concurrent index was built,
leading to toast relations using invalid indexes if there was a failure
before the swap phase. The update of reltoastidxids of toast relation is
done with RowExclusiveLock.
I also added a couple of tests in src/test/isolation. Btw, as for the time
being the swap step uses AccessExclusiveLock to switch old and new
relnames, it does not have any meaning to run them...
Btw, as an example of the problems caused by renaming:
postgres=# CREATE TABLE a (id serial primary key); CREATE TABLE b(id
serial primary key, a_id int REFERENCES a);
CREATE TABLE
Time: 137.840 ms
CREATE TABLE
Time: 143.500 ms
postgres=# \d b
Table "public.b"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('b_id_seq'::regclass)
a_id | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)
postgres=# REINDEX TABLE a CONCURRENTLY;
NOTICE: drop cascades to constraint b_a_id_fkey on table b
REINDEX
Time: 248.992 ms
postgres=# \d b
Table "public.b"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('b_id_seq'::regclass)
a_id | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
Looking at the patch for a bit now.
Regards,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-10 15:51:40 +0100, Andres Freund wrote:
On 2012-12-10 15:03:59 +0900, Michael Paquier wrote:
I have updated the patch (v4) to take care of updating reltoastidxid for
toast parent relations at the swap step by using index_update_stats. In
prior versions of the patch this was done when concurrent index was built,
leading to toast relations using invalid indexes if there was a failure
before the swap phase. The update of reltoastidxids of toast relation is
done with RowExclusiveLock.
I also added a couple of tests in src/test/isolation. Btw, as for the time
being the swap step uses AccessExclusiveLock to switch old and new
relnames, it does not have any meaning to run them...Btw, as an example of the problems caused by renaming:
Looking at the patch for a bit now.
Some review comments:
* Some of the added !is_reindex in index_create don't seem safe to
me. Why do we now support reindexing exlusion constraints?
* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
concurrent reindexing for user-tables and non-concurrent for system
tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...
* ISTM index_concurrent_swap should get exlusive locks on the relation
*before* printing their names. This shouldn't be required because we
have a lock prohibiting schema changes on the parent table, but it
feels safer.
* temporary index names during swapping should also be named via
ChooseIndexName
* why does create_toast_table pass an unconditional 'is_reindex' to
index_create?
* would be nice (but thats probably a step #2 thing) to do the
individual steps of concurrent reindex over multiple relations to
avoid too much overall waiting for other transactions.
* ReindexConcurrentIndexes:
* says " Such indexes are simply bypassed if caller has not specified
anything." but ERROR's. Imo ERROR is fine, but the comment should be
adjusted...
* should perhaps be names ReindexIndexesConcurrently?
* Imo the PHASE 1 comment should be after gathering/validitating the
chosen indexes
* It seems better to me to do use individual transactions + snapshots
for each index, no need to keep very long transactions open (PHASE
2/3)
* s/same whing/same thing/
* Shouldn't a CacheInvalidateRelcacheByRelid be done after PHASE 2 and
5 as well?
* PHASE 6 should acquire exlusive locks on the indexes
* can some of index_concurrent_* infrastructure be reused for
DROP INDEX CONCURRENTLY?
* in CREATE/DROP INDEX CONCURRENTLY 'CONCURRENTLY comes before the
object name, should we keep that conventioN?
Thats all I have for now.
Very nice work! Imo the code looks cleaner after your patch...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier <michael.paquier@gmail.com> writes:
On 2012/12/10, at 18:28, Simon Riggs <simon@2ndQuadrant.com> wrote:
If I have to choose between (1) keeping the same name OR (2) avoiding
an AccessExclusiveLock then I would choose (2). Most other people
would also, especially when all we would do is add/remove an
underscore. Even if that is user visible. And if it is we can support
a LOCK option that does (1) instead.
Ok. Removing the switch name part is only deleting 10 lines of code in index_concurrent_swap.
Then, do you guys have a preferred format for the concurrent index name? For the time being an inelegant _cct suffix is used. The underscore at the end?
You still need to avoid conflicting name assignments, so my
recommendation would really be to use the select-a-new-name code already
in use for CREATE INDEX without an index name. The underscore idea is
cute, but I doubt it's worth the effort to implement, document, or
explain it in a way that copes with repeated REINDEXes and conflicts.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.
If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10 December 2012 22:18, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?
Because the index isn't being renamed. An alternate equivalent index
is being created instead.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 12/10/12 5:21 PM, Simon Riggs wrote:
On 10 December 2012 22:18, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?Because the index isn't being renamed. An alternate equivalent index
is being created instead.
Right, basically, you can do this right now using
CREATE INDEX CONCURRENTLY ${name}_tmp ...
DROP INDEX CONCURRENTLY ${name};
ALTER INDEX ${name}_tmp RENAME TO ${name};
The only tricks here are if ${name}_tmp is already taken, in which case
you might as well just error out (or try a few different names), and if
${name} is already in use by the time you get to the last line, in which
case you can log a warning or an error.
What am I missing?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 10 December 2012 22:27, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/10/12 5:21 PM, Simon Riggs wrote:
On 10 December 2012 22:18, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?Because the index isn't being renamed. An alternate equivalent index
is being created instead.Right, basically, you can do this right now using
CREATE INDEX CONCURRENTLY ${name}_tmp ...
DROP INDEX CONCURRENTLY ${name};
ALTER INDEX ${name}_tmp RENAME TO ${name};The only tricks here are if ${name}_tmp is already taken, in which case
you might as well just error out (or try a few different names), and if
${name} is already in use by the time you get to the last line, in which
case you can log a warning or an error.What am I missing?
That this is already recorded in my book> ;-)
And also that REINDEX CONCURRENTLY doesn't work like that, yet.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-10 17:27:45 -0500, Peter Eisentraut wrote:
On 12/10/12 5:21 PM, Simon Riggs wrote:
On 10 December 2012 22:18, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?Because the index isn't being renamed. An alternate equivalent index
is being created instead.Right, basically, you can do this right now using
CREATE INDEX CONCURRENTLY ${name}_tmp ...
DROP INDEX CONCURRENTLY ${name};
ALTER INDEX ${name}_tmp RENAME TO ${name};The only tricks here are if ${name}_tmp is already taken, in which case
you might as well just error out (or try a few different names), and if
${name} is already in use by the time you get to the last line, in which
case you can log a warning or an error.What am I missing?
I don't think this is the problematic side of the patch.
The question is rather how to transfer the dependencies without too much
ugliness or how to swap oids without a race. Either by accepting an
exlusive lock or by playing some games, the latter possibly being easier
with renaming...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-10 22:33:50 +0000, Simon Riggs wrote:
On 10 December 2012 22:27, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/10/12 5:21 PM, Simon Riggs wrote:
On 10 December 2012 22:18, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?Because the index isn't being renamed. An alternate equivalent index
is being created instead.Right, basically, you can do this right now using
CREATE INDEX CONCURRENTLY ${name}_tmp ...
DROP INDEX CONCURRENTLY ${name};
ALTER INDEX ${name}_tmp RENAME TO ${name};The only tricks here are if ${name}_tmp is already taken, in which case
you might as well just error out (or try a few different names), and if
${name} is already in use by the time you get to the last line, in which
case you can log a warning or an error.What am I missing?
That this is already recorded in my book> ;-)
And also that REINDEX CONCURRENTLY doesn't work like that, yet.
The last submitted patch works pretty similar:
CREATE INDEX CONCURRENTLY $name_cct;
ALTER INDEX $name RENAME TO cct_$name;
ALTER INDEX $name_tmp RENAME TO $tmp;
ALTER INDEX $name_tmp RENAME TO $name_cct;
DROP INDEX CONURRENCTLY $name_cct;
It does that under an exlusive locks, but doesn't handle dependencies
yet...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Dec 10, 2012 at 11:51 PM, Andres Freund <andres@2ndquadrant.com>wrote:
Btw, as an example of the problems caused by renaming:
postgres=# CREATE TABLE a (id serial primary key); CREATE TABLE b(id
serial primary key, a_id int REFERENCES a);
CREATE TABLE
Time: 137.840 ms
CREATE TABLE
Time: 143.500 ms
postgres=# \d b
Table "public.b"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('b_id_seq'::regclass)
a_id | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
"b_a_id_fkey" FOREIGN KEY (a_id) REFERENCES a(id)postgres=# REINDEX TABLE a CONCURRENTLY;
NOTICE: drop cascades to constraint b_a_id_fkey on table b
REINDEX
Time: 248.992 ms
postgres=# \d b
Table "public.b"
Column | Type | Modifiers
--------+---------+------------------------------------------------
id | integer | not null default nextval('b_id_seq'::regclass)
a_id | integer |
Indexes:
"b_pkey" PRIMARY KEY, btree (id)
Oops. I will fix that in the next version of the patch. There should be an
elegant way to change the dependencies at the swap phase.
--
Michael Paquier
http://michael.otacoo.com
On Mon, Dec 10, 2012 at 5:18 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?
Yeah... and also, why do you think that? I thought the idea that we
could do any such thing had been convincingly refuted.
Frankly, I think that if REINDEX CONCURRENTLY is just shorthand for
"CREATE INDEX CONCURRENTLY with a different name and then DROP INDEX
CONCURRENTLY on the old name", it's barely worth doing. People can do
that already, and do, and then we don't have to explain the wart that
the name changes under you.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2012-12-11 15:23:52 -0500, Robert Haas wrote:
On Mon, Dec 10, 2012 at 5:18 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
On 12/8/12 9:40 AM, Tom Lane wrote:
I'm tempted to propose that REINDEX CONCURRENTLY simply not try to
preserve the index name exactly. Something like adding or removing
trailing underscores would probably serve to generate a nonconflicting
name that's not too unsightly.If you think you can rename an index without an exclusive lock, then why
not rename it back to the original name when you're done?Yeah... and also, why do you think that? I thought the idea that we
could do any such thing had been convincingly refuted.Frankly, I think that if REINDEX CONCURRENTLY is just shorthand for
"CREATE INDEX CONCURRENTLY with a different name and then DROP INDEX
CONCURRENTLY on the old name", it's barely worth doing. People can do
that already, and do, and then we don't have to explain the wart that
the name changes under you.
Its fundamentally different in that you can do it with constraints
referencing the index present. And that it works with toast tables.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for all your comments.
The new version (v5) of this patch fixes the error you found when
reindexing indexes being referenced in foreign keys.
The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c, in
charge of scanning pg_constraint for foreign keys that refer the parent
relation (confrelid) of the index being swapped and then switch conindid to
the new index if the old index was referenced.
This API also takes care of switching the dependency between the foreign
key and the old index by calling changeDependencyFor.
I also added a regression test for this purpose.
On Tue, Dec 11, 2012 at 12:28 AM, Andres Freund <andres@2ndquadrant.com>wrote:
Some review comments:
* Some of the added !is_reindex in index_create don't seem safe to
me.
This is added to control concurrent index relation for toast indexes. If we
do not add an additional flag for that it will not be possible to reindex
concurrently a toast index.
* Why do we now support reindexing exclusion constraints?
CREATE INDEX CONCURRENTLY is not supported for exclusive constraints but I
played around with exclusion constraints with my patch and did not
particularly see any problems in supporting them as for example index_build
performs a second scan of the heap when running so it looks enough solid
for that. Is it because the structure of REINDEX CONCURRENTLY patch is
different? Honestly I think no so is there something I am not aware of?
* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
concurrent reindexing for user-tables and non-concurrent for system
tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...
OK. I thought that this was out of scope for the time being. I haven't done
anything about that yet. Supporting that will not be complicated as
ReindexRelationsConcurrently (new API) is more flexible now, the only thing
needed is to gather the list of relations that need to be reindexed.
* ISTM index_concurrent_swap should get exlusive locks on the relation
*before* printing their names. This shouldn't be required because we
have a lock prohibiting schema changes on the parent table, but it
feels safer.
Done. AccessExclusiveLock is taken before calling RenameRelationInternal
now.
* temporary index names during swapping should also be named via
ChooseIndexName
Done. I used instead ChooseRelationName which is externalized through
defrem.h.
* why does create_toast_table pass an unconditional 'is_reindex' to
index_create?
Done. The flag is changed to false.
* would be nice (but thats probably a step #2 thing) to do the
individual steps of concurrent reindex over multiple relations to
avoid too much overall waiting for other transactions.
I think I did that by now using one transaction per index for each
operation except the drop phase...
* ReindexConcurrentIndexes:
I renamed ReindexConcurrentIndexes to ReindexRelationsConcurrently and
changed the arguments it used to something more generic:
ReindexRelationsConcurrently(List *relationIds)
relationIds is a list of relation Oids that can be include tables and/or
indexes Oid.
Based on this list of relation Oid, we build the list of indexes that are
rebuilt, including the toast indexes if necessary.
* says " Such indexes are simply bypassed if caller has not specified
anything." but ERROR's. Imo ERROR is fine, but the comment should be
adjusted...
Done.
* should perhaps be names ReindexIndexesConcurrently?
Kind of done.
* Imo the PHASE 1 comment should be after gathering/validitating the
chosen indexes
Comment is moved. Thanks.
* It seems better to me to do use individual transactions + snapshots
for each index, no need to keep very long transactions open (PHASE
2/3)
Good point. I did that. Now individual transactions are used for each index.
* s/same whing/same thing/
Done.
* Shouldn't a CacheInvalidateRelcacheByRelid be done after PHASE 2 and
5 as well?
Done. Nice catch.
* PHASE 6 should acquire exlusive locks on the indexes
The necessary lock is taken when calling index_drop through
performMultipleDeletion. Do you think it is not enough and that i should
add an Exclusive lock inside index_concurrent_drop?
* can some of index_concurrent_* infrastructure be reused for
DROP INDEX CONCURRENTLY?
Indeed. After looking at the code I found that that 2 steps are done in a
concurrent context: invalidating the index and set it as dead.
As REINDEX CONCURRENTLY does the following 2 steps in batch for a list of
indexes, I added index_concurrent_set_dead to set up the dropped indexes as
dead, and index_concurrent_clear_valid. Those 2 functions are used by both
REINDEX CONCURRENTLY and DROP INDEX CONCURRENTLY.
* in CREATE/DROP INDEX CONCURRENTLY 'CONCURRENTLY comes before the
object name, should we keep that conventioN?
Good point. I changed the grammar to REINDEX obj [ CONCURRENTLY ] objname.
Thanks,
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20121217_reindex_concurrently_v5.patchapplication/octet-stream; name=20121217_reindex_concurrently_v5.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..ba13703 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,93 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ </refsect2>
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 11086e2..2975bd7 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1141,7 +1141,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index d93d273..740e9f0 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2642,7 +2642,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 66012ac..537113f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -42,6 +42,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -671,6 +672,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -694,7 +699,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -737,19 +743,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,381 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel;
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ Relation pg_class;
+
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ pg_class = heap_open(parentOid, RowExclusiveLock);
+
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(pg_class, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(pg_class, RowExclusiveLock);
+ }
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query. To do this, inquire which xacts currently would
+ * conflict with AccessExclusiveLock on the table -- ie, which ones
+ * have a lock of any kind on the table. Then wait for each of these
+ * xacts to commit or abort. Note we do not need to worry about xacts
+ * that open the table for reading after this point; they will see the
+ * index as invalid when they open the relation.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need
+ * not check for that. Also, prepared xacts are not reported, which
+ * is fine since they certainly aren't going to do anything more.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a list of indexes as the last step of a concurrent process. Deletion is
+ * done through performDeletion or dependencies of the index are not dropped.
+ * At this point all the indexes are already considered as invalid and dead so
+ * they can be dropped without using any concurrent options.
+ */
+void
+index_concurrent_drop(List *indexIds)
+{
+ ListCell *lc;
+ ObjectAddresses *objects = new_object_addresses();
+
+ Assert(indexIds != NIL);
+
+ /* Scan the list of indexes and build object list for normal indexes */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+
+ /* Register constraint or index for drop */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Add object to list */
+ add_exact_object_address(&object, objects);
+ }
+
+ /* Perform deletion for normal and toast indexes */
+ performMultipleDeletions(objects,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1325,7 +1710,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1407,17 +1791,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1445,63 +1820,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1514,13 +1834,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1942,6 +2256,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1955,7 +2271,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2070,7 +2387,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3188,7 +3506,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 5e8c6da..55c092d 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,75 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /* Found an index, so update its pg_constraint entry */
+ contuple->conindid = newIndexOid;
+ /* And write it back in place */
+ heap_inplace_update(conRel, htup);
+
+ /*
+ * Switch all the dependencies of this foreign key from the
+ * old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 2979819..2c908f5 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 75f9ff1..dc6daff 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,547 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationsConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given list of relation Oids. This list of
+ * indexes rebuilt is extracted from the list of relation Oids given in output
+ * that can be either relations or indexes.
+ * Each reindexing step is done simultaneously for all the indexes extracted.
+ */
+bool
+ReindexRelationsConcurrently(List *relationIds)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt.
+ */
+ foreach(lc, relationIds)
+ {
+ Oid relationOid = lfirst_oid(lc);
+
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds,
+ cellOid);
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, cellOid);
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ }
+
+ heap_close(toastRelation, ShareUpdateExclusiveLock);
+ }
+
+ heap_close(heapRelation, ShareUpdateExclusiveLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, relationOid);
+
+ index_close(indexRelation, ShareUpdateExclusiveLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ /*
+ * Build a unique list of parent relation Oids based on the extracted index
+ * list. This list of Oids is used to take session locks on the parent
+ * relations of indexes to prevent concurrent drop of relations involved by
+ * the concurrent reindex.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid parentOid = IndexGetRelation(lfirst_oid(lc), false);
+ parentRelationIds = list_append_unique_oid(parentRelationIds, parentOid);
+ }
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. We also
+ * need to wait until no running transactions could have the parent table
+ * of index open. A concurrent build is done for each concurrent
+ * index that will replace the old indexes.
+ */
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+ bool primary;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit all
+ * sessions will refresh any cached plans taht might reference the index.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ LOCKTAG *heapLockTag;
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /* Finish the index invalidation and set it as dead */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ index_concurrent_drop(indexIds);
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1977,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +2004,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2123,26 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
void
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* This is all for the non-concurrent case */
+ if (!concurrent)
+ {
+ reindex_index(indOid, false);
+ return;
+ }
+
+ /* Continue through REINDEX CONCURRENTLY */
+ ReindexRelationsConcurrently(list_make1_oid(indOid));
}
/*
@@ -1745,18 +2204,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
void
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexRelationsConcurrently(list_make1_oid(heapOid)))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1773,7 +2244,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
void
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1785,6 +2259,12 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /* CONCURRENTLY operation is not allowed for a database */
+ if (concurrent && do_system)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 9387ee9..0685ae4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3601,6 +3601,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 95a95f4..cdea86a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1840,6 +1840,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index ad98b36..b5283cc 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6670,29 +6670,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 94f58a9..40dedde 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,114 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction can have the table open with the index marked as
+ * read-only for updates.
+ * To do this, inquire which xacts currently would conflict with ShareLock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the table. Then wait for each of these xacts to commit or abort.
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a42b8e9..9424140 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1255,15 +1255,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1275,8 +1279,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index b96099f..0cf0da4 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,24 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(List *indexIds);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +105,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 238fe58..d68ccca 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index e4e9c40..e9921c4 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 2c81b78..6210678 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern void ReindexIndex(RangeVar *indexRelation);
-extern void ReindexTable(RangeVar *relation);
+extern void ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern void ReindexTable(RangeVar *relation, bool concurrent);
extern void ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationsConcurrently(List *relationIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 8834499..46bc532 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2511,6 +2511,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 9933dad..2e2d9dc 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..4053b53
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE reind_con_tab CONCURRENTLY; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..612089c 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,48 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX DATABASE CONCURRENTLY postgres; -- not allowed for DATABASE
+ERROR: cannot reindex system concurrently
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..b77c7a4 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,34 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX DATABASE CONCURRENTLY postgres; -- not allowed for DATABASE
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2012-12-17 11:44:00 +0900, Michael Paquier wrote:
Thanks for all your comments.
The new version (v5) of this patch fixes the error you found when
reindexing indexes being referenced in foreign keys.
The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c, in
charge of scanning pg_constraint for foreign keys that refer the parent
relation (confrelid) of the index being swapped and then switch conindid to
the new index if the old index was referenced.
This API also takes care of switching the dependency between the foreign
key and the old index by calling changeDependencyFor.
I also added a regression test for this purpose.
Ok. Are there no other depencencies towards indexes? I don't know of any
right now, but I have the feeling there were some other cases.
On Tue, Dec 11, 2012 at 12:28 AM, Andres Freund <andres@2ndquadrant.com>wrote:
Some review comments:
* Some of the added !is_reindex in index_create don't seem safe to
me.This is added to control concurrent index relation for toast indexes. If we
do not add an additional flag for that it will not be possible to reindex
concurrently a toast index.
I think some of them were added for cases that didn't seem to be related
to that. I'll recheck in the current version.
* Why do we now support reindexing exclusion constraints?
CREATE INDEX CONCURRENTLY is not supported for exclusive constraints but I
played around with exclusion constraints with my patch and did not
particularly see any problems in supporting them as for example index_build
performs a second scan of the heap when running so it looks enough solid
for that. Is it because the structure of REINDEX CONCURRENTLY patch is
different? Honestly I think no so is there something I am not aware of?
I think I asked because you had added an && !is_reindex to one of the
checks.
If I recall the reason why concurrent index builds couldn't support
exclusion constraints correctly - namely that we cannot use them to
check for new row versions when the index is in the ready && !valid
state - that shouldn't be a problem when we have a valid version of an
old index arround because that enforces everything. It would maybe need
an appropriate if (!isvalid) in the exclusion constraint code, but that
should be it.
* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
concurrent reindexing for user-tables and non-concurrent for system
tables would be very useful. E.g. for the upgrade from 9.1.5->9.1.6...OK. I thought that this was out of scope for the time being. I haven't done
anything about that yet. Supporting that will not be complicated as
ReindexRelationsConcurrently (new API) is more flexible now, the only thing
needed is to gather the list of relations that need to be reindexed.
Imo that so greatly reduces the usability of this patch that you should
treat it as in scope ;). Especially as you say, it really shouldn't be
that much work with all the groundwork built.
* would be nice (but thats probably a step #2 thing) to do the
individual steps of concurrent reindex over multiple relations to
avoid too much overall waiting for other transactions.I think I did that by now using one transaction per index for each
operation except the drop phase...
Without yet having read the new version, I think thats not what I
meant. There currently is a wait for concurrent transactions to end
after most of the phases for every relation, right? If you have a busy
database with somewhat longrunning transactions thats going to slow
everything down with waiting quite bit. I wondered whether it would make
sense to do PHASE1 for all indexes in all relations, then wait once,
then PHASE2...
That obviously has some space and index maintainece overhead issues, but
its probably sensible anyway in many cases.
* PHASE 6 should acquire exlusive locks on the indexes
The necessary lock is taken when calling index_drop through
performMultipleDeletion. Do you think it is not enough and that i should
add an Exclusive lock inside index_concurrent_drop?
It seems to be safer to acquire it earlier, otherwise the likelihood for
deadlocks seems to be slightly higher as youre increasing the lock
severity. And it shouldn't cause any disadvantages,s o ...
Starts to look really nice now!
Isn't the following block content thats mostly available somewhere else
already?
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY"> + <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title> + + <indexterm zone="SQL-REINDEX-CONCURRENTLY"> + <primary>index</primary> + <secondary>rebuilding concurrently</secondary> + </indexterm> + + <para> + Rebuilding an index can interfere with regular operation of a database. + Normally <productname>PostgreSQL</> locks the table whose index is rebuilt + against writes and performs the entire index build with a single scan of the + table. Other transactions can still read the table, but if they try to + insert, update, or delete rows in the table they will block until the + index rebuild is finished. This could have a severe effect if the system is + a live production database. Very large tables can take many hours to be + indexed, and even for smaller tables, an index rebuild can lock out writers + for periods that are unacceptably long for a production system. + </para>
...
+ <para> + Regular index builds permit other regular index builds on the + same table to occur in parallel, but only one concurrent index build + can occur on a table at a time. In both cases, no other types of schema + modification on the table are allowed meanwhile. Another difference + is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</> + command can be performed within a transaction block, but + <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is + by default not allowed to run inside a transaction block, so in this case + <command>CONCURRENTLY</> is not supported. + </para> +
- if (concurrent && is_exclusion) + if (concurrent && is_exclusion && !is_reindex) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
This is what I referred to above wrt reindex and CONCURRENTLY. We
shouldn't pass concurrently if we don't deem it to be safe for exlusion
constraints.
+/* + * index_concurrent_drop + * + * Drop a list of indexes as the last step of a concurrent process. Deletion is + * done through performDeletion or dependencies of the index are not dropped. + * At this point all the indexes are already considered as invalid and dead so + * they can be dropped without using any concurrent options. + */ +void +index_concurrent_drop(List *indexIds) +{ + ListCell *lc; + ObjectAddresses *objects = new_object_addresses(); + + Assert(indexIds != NIL); + + /* Scan the list of indexes and build object list for normal indexes */ + foreach(lc, indexIds) + { + Oid indexOid = lfirst_oid(lc); + Oid constraintOid = get_index_constraint(indexOid); + ObjectAddress object; + + /* Register constraint or index for drop */ + if (OidIsValid(constraintOid)) + { + object.classId = ConstraintRelationId; + object.objectId = constraintOid; + } + else + { + object.classId = RelationRelationId; + object.objectId = indexOid; + } + + object.objectSubId = 0; + + /* Add object to list */ + add_exact_object_address(&object, objects); + } + + /* Perform deletion for normal and toast indexes */ + performMultipleDeletions(objects, + DROP_RESTRICT, + 0); +}
Just for warm and fuzzy feeling I think it would be a good idea to
recheck that indexes are !indislive here.
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c index 5e8c6da..55c092d 100644 + +/* + * switchIndexConstraintOnForeignKey + * + * Switch foreign keys references for a given index to a new index created + * concurrently. This process is used when swapping indexes for a concurrent + * process. All the constraints that are not referenced externally like primary + * keys or unique indexes should be switched using the structure of index.c for + * concurrent index creation and drop. + * This function takes care of also switching the dependencies of the foreign + * key from the old index to the new index in pg_depend. + * + * In order to complete this process, the following process is done: + * 1) Scan pg_constraint and extract the list of foreign keys that refer to the + * parent relation of the index being swapped as conrelid. + * 2) Check in this list the foreign keys that use the old index as reference + * here with conindid + * 3) Update field conindid to the new index Oid on all the foreign keys + * 4) Switch dependencies of the foreign key to the new index + */ +void +switchIndexConstraintOnForeignKey(Oid parentOid, + Oid oldIndexOid, + Oid newIndexOid) +{ + ScanKeyData skey[1]; + SysScanDesc conscan; + Relation conRel; + HeapTuple htup; + + /* + * Search pg_constraint for the foreign key constraints associated + * with the index by scanning using conrelid. + */ + ScanKeyInit(&skey[0], + Anum_pg_constraint_confrelid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(parentOid)); + + conRel = heap_open(ConstraintRelationId, AccessShareLock); + conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId, + true, SnapshotNow, 1, skey); + + while (HeapTupleIsValid(htup = systable_getnext(conscan))) + { + Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup); + + /* Check if a foreign constraint uses the index being swapped */ + if (contuple->contype == CONSTRAINT_FOREIGN && + contuple->confrelid == parentOid && + contuple->conindid == oldIndexOid) + { + /* Found an index, so update its pg_constraint entry */ + contuple->conindid = newIndexOid; + /* And write it back in place */ + heap_inplace_update(conRel, htup);
I am pretty doubtful that using heap_inplace_update is the correct thing
to do here. What if we fail later? Even if there's some justification
for it being safe it deserves a big comment.
The other cases where heap_inplace_update is used in the context of
CONCURRENTLY are pretty careful about where to do it and have special
state flags of indicating that this has been done...
+bool +ReindexRelationsConcurrently(List *relationIds) +{ + foreach(lc, relationIds) + { + Oid relationOid = lfirst_oid(lc); + + switch (get_rel_relkind(relationOid)) + { + case RELKIND_RELATION: + { + /* + * In the case of a relation, find all its indexes + * including toast indexes. + */ + Relation heapRelation = heap_open(relationOid, + ShareUpdateExclusiveLock); + + /* Relation on which is based index cannot be shared */ + if (heapRelation->rd_rel->relisshared) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("concurrent reindex is not supported for shared relations"))); + + /* Add all the valid indexes of relation to list */ + foreach(lc2, RelationGetIndexList(heapRelation)) + { + Oid cellOid = lfirst_oid(lc2); + Relation indexRelation = index_open(cellOid, + ShareUpdateExclusiveLock); + + if (!indexRelation->rd_index->indisvalid) + ereport(WARNING, + (errcode(ERRCODE_INDEX_CORRUPTED), + errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing", + get_namespace_name(get_rel_namespace(cellOid)), + get_rel_name(cellOid)))); + else + indexIds = list_append_unique_oid(indexIds, + cellOid); + + index_close(indexRelation, ShareUpdateExclusiveLock); + }
Why are we releasing the locks here if we are going to reindex the
relations? They might change inbetween. I think we should take an
appropriate lock here, including the locks on the parent relations. Yes,
its slightly more duplicative code, and not acquiring locks multiple
times is somewhat complicated, but I think its required.
I think you should also explicitly do the above in a transaction...
+ /* + * Phase 2 of REINDEX CONCURRENTLY + * + * Build concurrent indexes in a separate transaction for each index to + * avoid having open transactions for an unnecessary long time. We also + * need to wait until no running transactions could have the parent table + * of index open. A concurrent build is done for each concurrent + * index that will replace the old indexes. + */ + + /* Get the first element of concurrent index list */ + lc2 = list_head(concurrentIndexIds); + + foreach(lc, indexIds) + { + Relation indexRel; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = lfirst_oid(lc2); + Oid relOid; + bool primary; + LOCKTAG *heapLockTag = NULL; + ListCell *cell; + + /* Move to next concurrent item */ + lc2 = lnext(lc2); + + /* Start new transaction for this index concurrent build */ + StartTransactionCommand(); + + /* Get the parent relation Oid */ + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, we need to wait for + * locks on it. + */ + foreach(cell, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + + Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid); + WaitForVirtualLocks(*heapLockTag, ShareLock);
Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
once for all relations after each phase? Otherwise the waiting time will
really start to hit when you do this on a somewhat busy server.
+ /* + * Invalidate the relcache for the table, so that after this commit all + * sessions will refresh any cached plans taht might reference the index. + */ + CacheInvalidateRelcacheByRelid(relOid);
I am not sure whether I suggested adding a
CacheInvalidateRelcacheByRelid here, but afaics its not required yet,
the plan isn't valid yet, so no need for replanning.
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
I wonder we should directly open it exlusive here given its going to
opened exclusively in a bit anyway. Not that that will really reduce the
deadlock likelihood since we already hold the ShareUpdateExclusiveLock
in session mode ...
+ /* + * Phase 5 of REINDEX CONCURRENTLY + * + * The old indexes need to be marked as not ready. We need also to wait for + * transactions that might use them. Each operation is performed with a + * separate transaction. + */ + + /* Mark the old indexes as not ready */ + foreach(lc, indexIds) + { + LOCKTAG *heapLockTag; + Oid indOid = lfirst_oid(lc); + Oid relOid; + + StartTransactionCommand(); + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, we need to wait for + * locks on it. + */ + foreach(lc2, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + + Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid); + + /* Finish the index invalidation and set it as dead */ + index_concurrent_set_dead(indOid, relOid, *heapLockTag); + + /* Commit this transaction to make the update visible. */ + CommitTransactionCommand(); + }
No waiting here?
+ StartTransactionCommand(); + + /* Get fresh snapshot for next step */ + PushActiveSnapshot(GetTransactionSnapshot()); + + /* + * Phase 6 of REINDEX CONCURRENTLY + * + * Drop the old indexes. This needs to be done through performDeletion + * or related dependencies will not be dropped for the old indexes. The + * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the + * indexes are already considered as dead and invalid, so they will not + * be used by other backends. + */ + index_concurrent_drop(indexIds); + + /* + * Last thing to do is release the session-level lock on the parent table + * and the indexes of table. + */ + foreach(lc, relationLocks) + { + LockRelId lockRel = * (LockRelId *) lfirst(lc); + UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock); + } + + /* We can do away with our snapshot */ + PopActiveSnapshot();
I think I would do the drop in individual transactions as well.
More at another time, shouldn't have started doing this now...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
OK. I am back to this patch after a too long time.
Please find an updated version of the patch attached (v6). I address all
the previous comments, except regarding the support for REINDEX DATABASE
CONCURRENTLY. I am working on that precisely but I am not sure it is that
straight-forward...
On Wed, Dec 19, 2012 at 11:24 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2012-12-17 11:44:00 +0900, Michael Paquier wrote:
Thanks for all your comments.
The new version (v5) of this patch fixes the error you found when
reindexing indexes being referenced in foreign keys.
The fix is done with switchIndexConstraintOnForeignKey:pg_constraint.c,in
charge of scanning pg_constraint for foreign keys that refer the parent
relation (confrelid) of the index being swapped and then switch conindidto
the new index if the old index was referenced.
This API also takes care of switching the dependency between the foreign
key and the old index by calling changeDependencyFor.
I also added a regression test for this purpose.Ok. Are there no other depencencies towards indexes? I don't know of any
right now, but I have the feeling there were some other cases.
The patch cover the cases of PRIMARY, UNIQUE, normal indexes, exclusion
constraints and foreign keys. Just based on the docs, I don't think there
is something missing.
http://www.postgresql.org/docs/9.2/static/ddl-constraints.html
* REINDEX DATABASE .. CONCURRENTLY doesn't work, a variant that does the
concurrent reindexing for user-tables and non-concurrent for system
tables would be very useful. E.g. for the upgrade from9.1.5->9.1.6...
OK. I thought that this was out of scope for the time being. I haven't
done
anything about that yet. Supporting that will not be complicated as
ReindexRelationsConcurrently (new API) is more flexible now, the onlything
needed is to gather the list of relations that need to be reindexed.
Imo that so greatly reduces the usability of this patch that you should
treat it as in scope ;). Especially as you say, it really shouldn't be
that much work with all the groundwork built.
OK. So... What should we do when a REINDEX DATABASE CONCURRENTLY is done?
- only reindex user tables and bypass system tables?
- reindex user tables concurrently and system tables non-concurrently?
- forbid this operation when this operation is done on a database having
system tables?
Some input?
Btw, the attached version of the patch does not include this feature yet
but I am working on it.
* would be nice (but thats probably a step #2 thing) to do the
individual steps of concurrent reindex over multiple relations to
avoid too much overall waiting for other transactions.I think I did that by now using one transaction per index for each
operation except the drop phase...Without yet having read the new version, I think thats not what I
meant. There currently is a wait for concurrent transactions to end
after most of the phases for every relation, right? If you have a busy
database with somewhat longrunning transactions thats going to slow
everything down with waiting quite bit. I wondered whether it would make
sense to do PHASE1 for all indexes in all relations, then wait once,
then PHASE2...
That obviously has some space and index maintainece overhead issues, but
its probably sensible anyway in many cases.
OK, phase 1 is done with only one transaction for all the indexes. Do you
mean that we should do that with a single transaction for each index?
Isn't the following block content thats mostly available somewhere else
already?
[... doc extract ...]
Yes, this portion of the docs is pretty similar to what is findable in
CREATE INDEX CONCURRENTLY. Why not creating a new common documentation
section that CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY could refer
to? I think we should first work on the code and then do the docs properly
though.
- if (concurrent && is_exclusion) + if (concurrent && is_exclusion && !is_reindex) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg_internal("concurrent indexcreation for exclusion constraints is not supported")));
This is what I referred to above wrt reindex and CONCURRENTLY. We
shouldn't pass concurrently if we don't deem it to be safe for exlusion
constraints.
So does that mean that it is not possible to create an exclusive constraint
in a concurrent context? Code path used by REINDEX concurrently permits to
create an index in parallel of an existing one and not a completely new
index. Shouldn't this work for indexes used by exclusion indexes also?
+/* + * index_concurrent_drop + * + * Drop a list of indexes as the last step of a concurrent process.Deletion is
+ * done through performDeletion or dependencies of the index are not
dropped.
+ * At this point all the indexes are already considered as invalid and
dead so
+ * they can be dropped without using any concurrent options. + */ +void +index_concurrent_drop(List *indexIds) +{ + ListCell *lc; + ObjectAddresses *objects = new_object_addresses(); + + Assert(indexIds != NIL); + + /* Scan the list of indexes and build object list for normalindexes */
+ foreach(lc, indexIds) + { + Oid indexOid = lfirst_oid(lc); + Oid constraintOid =get_index_constraint(indexOid);
+ ObjectAddress object; + + /* Register constraint or index for drop */ + if (OidIsValid(constraintOid)) + { + object.classId = ConstraintRelationId; + object.objectId = constraintOid; + } + else + { + object.classId = RelationRelationId; + object.objectId = indexOid; + } + + object.objectSubId = 0; + + /* Add object to list */ + add_exact_object_address(&object, objects); + } + + /* Perform deletion for normal and toast indexes */ + performMultipleDeletions(objects, + DROP_RESTRICT, + 0); +}Just for warm and fuzzy feeling I think it would be a good idea to
recheck that indexes are !indislive here.
OK done. The indexes with indislive set at true are not bypassed now.
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.cindex 5e8c6da..55c092d 100644 + +/* + * switchIndexConstraintOnForeignKey + * + * Switch foreign keys references for a given index to a new indexcreated
+ * concurrently. This process is used when swapping indexes for a
concurrent
+ * process. All the constraints that are not referenced externally like
primary
+ * keys or unique indexes should be switched using the structure of
index.c for
+ * concurrent index creation and drop. + * This function takes care of also switching the dependencies of theforeign
+ * key from the old index to the new index in pg_depend. + * + * In order to complete this process, the following process is done: + * 1) Scan pg_constraint and extract the list of foreign keys thatrefer to the
+ * parent relation of the index being swapped as conrelid. + * 2) Check in this list the foreign keys that use the old index asreference
+ * here with conindid + * 3) Update field conindid to the new index Oid on all the foreign keys + * 4) Switch dependencies of the foreign key to the new index + */ +void +switchIndexConstraintOnForeignKey(Oid parentOid, + OidoldIndexOid,
+ Oid
newIndexOid)
+{ + ScanKeyData skey[1]; + SysScanDesc conscan; + Relation conRel; + HeapTuple htup; + + /* + * Search pg_constraint for the foreign key constraints associated + * with the index by scanning using conrelid. + */ + ScanKeyInit(&skey[0], + Anum_pg_constraint_confrelid, + BTEqualStrategyNumber, F_OIDEQ, + ObjectIdGetDatum(parentOid)); + + conRel = heap_open(ConstraintRelationId, AccessShareLock); + conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId, + true,SnapshotNow, 1, skey);
+ + while (HeapTupleIsValid(htup = systable_getnext(conscan))) + { + Form_pg_constraint contuple = (Form_pg_constraint)GETSTRUCT(htup);
+ + /* Check if a foreign constraint uses the index beingswapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN && + contuple->confrelid == parentOid && + contuple->conindid == oldIndexOid) + { + /* Found an index, so update its pg_constraintentry */
+ contuple->conindid = newIndexOid; + /* And write it back in place */ + heap_inplace_update(conRel, htup);I am pretty doubtful that using heap_inplace_update is the correct thing
to do here. What if we fail later? Even if there's some justification
for it being safe it deserves a big comment.The other cases where heap_inplace_update is used in the context of
CONCURRENTLY are pretty careful about where to do it and have special
state flags of indicating that this has been done...
Oops, fixed. I changed it to simple_heap_update.
+bool +ReindexRelationsConcurrently(List *relationIds) +{ + foreach(lc, relationIds) + { + Oid relationOid = lfirst_oid(lc); + + switch (get_rel_relkind(relationOid)) + { + case RELKIND_RELATION: + { + /* + * In the case of a relation, findall its indexes
+ * including toast indexes. + */ + Relation heapRelation =heap_open(relationOid,
+
ShareUpdateExclusiveLock);
+ + /* Relation on which is basedindex cannot be shared */
+ if
(heapRelation->rd_rel->relisshared)
+ ereport(ERROR, +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+
errmsg("concurrent reindex is not supported for shared relations")));
+ + /* Add all the valid indexes ofrelation to list */
+ foreach(lc2,
RelationGetIndexList(heapRelation))
+ { + OidcellOid = lfirst_oid(lc2);
+ Relation
indexRelation = index_open(cellOid,
+
ShareUpdateExclusiveLock);
+ + if(!indexRelation->rd_index->indisvalid)
+ ereport(WARNING, +(errcode(ERRCODE_INDEX_CORRUPTED),
+
errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+
get_namespace_name(get_rel_namespace(cellOid)),
+
get_rel_name(cellOid))));
+ else + indexIds =list_append_unique_oid(indexIds,
+
cellOid);
+ + index_close(indexRelation,ShareUpdateExclusiveLock);
+ }
Why are we releasing the locks here if we are going to reindex the
relations? They might change inbetween. I think we should take an
appropriate lock here, including the locks on the parent relations. Yes,
its slightly more duplicative code, and not acquiring locks multiple
times is somewhat complicated, but I think its required.
OK, the locks are now maintained until the end of the transaction and when
the session locks are taken on those relations, so it will not be possible
to have schema changes between the moment where the list of indexes is
built and the moment the session locks are taken.
I think you should also explicitly do the above in a transaction...
I am not sure I get your point here. This phase is in place to gather the
list of all the indexes to reindex based on the list of relations given by
caller.
+ /* + * Phase 2 of REINDEX CONCURRENTLY + * + * Build concurrent indexes in a separate transaction for eachindex to
+ * avoid having open transactions for an unnecessary long time.
We also
+ * need to wait until no running transactions could have the
parent table
+ * of index open. A concurrent build is done for each concurrent + * index that will replace the old indexes. + */ + + /* Get the first element of concurrent index list */ + lc2 = list_head(concurrentIndexIds); + + foreach(lc, indexIds) + { + Relation indexRel; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = lfirst_oid(lc2); + Oid relOid; + bool primary; + LOCKTAG *heapLockTag = NULL; + ListCell *cell; + + /* Move to next concurrent item */ + lc2 = lnext(lc2); + + /* Start new transaction for this index concurrent build */ + StartTransactionCommand(); + + /* Get the parent relation Oid */ + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, weneed to wait for
+ * locks on it. + */ + foreach(cell, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + + Assert(heapLockTag && heapLockTag->locktag_field2 !=InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
once for all relations after each phase? Otherwise the waiting time will
really start to hit when you do this on a somewhat busy server.
Each new index is built and set as ready in a separate single transaction,
so doesn't it make sense to wait for the parent relation each time. It is
possible to wait for a parent relation only once during this phase but in
this case all the indexes of the same relation need to be set as ready in
the same transaction. So here the choice is either to wait for the same
relation multiple times for a single index or wait once for a parent
relation but we build all the concurrent indexes within the same
transaction. Choice 1 makes the code clearer and more robust to my mind as
the phase 2 is done clearly for each index separately. Thoughts?
+ /* + * Invalidate the relcache for the table, so that afterthis commit all
+ * sessions will refresh any cached plans taht might
reference the index.
+ */ + CacheInvalidateRelcacheByRelid(relOid);I am not sure whether I suggested adding a
CacheInvalidateRelcacheByRelid here, but afaics its not required yet,
the plan isn't valid yet, so no need for replanning.
Sure I removed it.
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
I wonder we should directly open it exlusive here given its going to
opened exclusively in a bit anyway. Not that that will really reduce the
deadlock likelihood since we already hold the ShareUpdateExclusiveLock
in session mode ...
I tried to use an AccessExclusiveLock here but it happens that this is not
compatible with index_set_state_flags. Does taking an exclusive lock
increments the transaction ID of running transaction? Because what I am
seeing is that taking AccessExclusiveLock on this index does a transaction
update.
For those reasons current code sticks with ShareUpdateExclusiveLock. Not a
big deal btw...
+ /* + * Phase 5 of REINDEX CONCURRENTLY + * + * The old indexes need to be marked as not ready. We need also towait for
+ * transactions that might use them. Each operation is performed
with a
+ * separate transaction. + */ + + /* Mark the old indexes as not ready */ + foreach(lc, indexIds) + { + LOCKTAG *heapLockTag; + Oid indOid = lfirst_oid(lc); + Oid relOid; + + StartTransactionCommand(); + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, weneed to wait for
+ * locks on it. + */ + foreach(lc2, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + + Assert(heapLockTag && heapLockTag->locktag_field2 !=InvalidOid);
+ + /* Finish the index invalidation and set it as dead */ + index_concurrent_set_dead(indOid, relOid, *heapLockTag); + + /* Commit this transaction to make the update visible. */ + CommitTransactionCommand(); + }No waiting here?
A wait phase is done inside index_concurrent_set_dead, so no problem.
+ StartTransactionCommand(); + + /* Get fresh snapshot for next step */ + PushActiveSnapshot(GetTransactionSnapshot()); + + /* + * Phase 6 of REINDEX CONCURRENTLY + * + * Drop the old indexes. This needs to be done throughperformDeletion
+ * or related dependencies will not be dropped for the old
indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as
here the
+ * indexes are already considered as dead and invalid, so they
will not
+ * be used by other backends. + */ + index_concurrent_drop(indexIds); + + /* + * Last thing to do is release the session-level lock on theparent table
+ * and the indexes of table. + */ + foreach(lc, relationLocks) + { + LockRelId lockRel = * (LockRelId *) lfirst(lc); + UnlockRelationIdForSession(&lockRel,ShareUpdateExclusiveLock);
+ } + + /* We can do away with our snapshot */ + PopActiveSnapshot();I think I would do the drop in individual transactions as well.
Done. Each drop is now done in a single transaction.
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20130115_reindex_concurrently_v6.patchapplication/octet-stream; name=20130115_reindex_concurrently_v6.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..ba13703 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,93 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ </refsect2>
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 2632058..6269092 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2642,7 +2642,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5892e44..1a00589 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -42,6 +42,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -671,6 +672,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -694,7 +699,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -737,19 +743,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,397 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel;
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ Relation pg_class;
+
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ pg_class = heap_open(parentOid, RowExclusiveLock);
+
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(pg_class, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(pg_class, RowExclusiveLock);
+ }
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query. To do this, inquire which xacts currently would
+ * conflict with AccessExclusiveLock on the table -- ie, which ones
+ * have a lock of any kind on the table. Then wait for each of these
+ * xacts to commit or abort. Note we do not need to worry about xacts
+ * that open the table for reading after this point; they will see the
+ * index as invalid when they open the relation.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need
+ * not check for that. Also, prepared xacts are not reported, which
+ * is fine since they certainly aren't going to do anything more.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ ListCell *lc;
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1325,7 +1726,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1407,17 +1807,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1445,63 +1836,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1514,13 +1850,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1942,6 +2272,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1955,7 +2287,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2070,7 +2403,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3188,7 +3522,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 94efd13..0da145f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,570 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationsConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given list of relation Oids. This list of
+ * indexes rebuilt is extracted from the list of relation Oids given in output
+ * that can be either relations or indexes.
+ * Each reindexing step is done simultaneously for all the indexes extracted.
+ */
+bool
+ReindexRelationsConcurrently(List *relationIds)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ foreach(lc, relationIds)
+ {
+ Oid relationOid = lfirst_oid(lc);
+
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds,
+ cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ /*
+ * Build a unique list of parent relation Oids based on the extracted index
+ * list. This list of Oids is used to take session locks on the parent
+ * relations of indexes to prevent concurrent drop of relations involved by
+ * the concurrent reindex.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid parentOid = IndexGetRelation(lfirst_oid(lc), false);
+ parentRelationIds = list_append_unique_oid(parentRelationIds, parentOid);
+ }
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. We also
+ * need to wait until no running transactions could have the parent table
+ * of index open. A concurrent build is done for each concurrent
+ * index that will replace the old indexes.
+ */
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+ bool primary;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ LOCKTAG *heapLockTag;
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /* Finish the index invalidation and set it as dead */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +2000,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +2027,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2146,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationsConcurrently(list_make1_oid(indOid));
return indOid;
}
@@ -1747,18 +2225,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexRelationsConcurrently(list_make1_oid(heapOid)))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2267,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2282,12 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /* CONCURRENTLY operation is not allowed for a database */
+ if (concurrent && do_system)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 51fdb63..a84a71c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3601,6 +3601,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 4b219b3..1418f50 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1840,6 +1840,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 76ef11e..6855ea5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6670,29 +6670,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..9f6a0f2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,114 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction can have the table open with the index marked as
+ * read-only for updates.
+ * To do this, inquire which xacts currently would conflict with ShareLock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the table. Then wait for each of these xacts to commit or abort.
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index ad5e303..89b4c0d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1255,15 +1255,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1275,8 +1279,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..335a620 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,24 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +105,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index c327136..3f45483 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationsConcurrently(List *relationIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 56cf592..5096fa4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2511,6 +2511,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..0b591ce 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..4053b53
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE reind_con_tab CONCURRENTLY; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..612089c 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,48 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX DATABASE CONCURRENTLY postgres; -- not allowed for DATABASE
+ERROR: cannot reindex system concurrently
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..b77c7a4 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,34 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX DATABASE CONCURRENTLY postgres; -- not allowed for DATABASE
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Hi,
Please find attached v7 of this patch, adding support for REINDEX DATABASE
CONCURRENTLY.
When using REINDEX DATABASE with CONCURRENTLY, non-system tables are
reindexed concurrently and system tables are reindexed in the normal way,
ie non-concurrently.
Thanks,
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20130116_reindex_concurrently_v7.patchapplication/octet-stream; name=20130116_reindex_concurrently_v7.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..b12e684 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,103 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system indexes. System
+ indexes are rebuilt with a non-concurrent context.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +375,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 2632058..6269092 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2642,7 +2642,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5892e44..1a00589 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -42,6 +42,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -671,6 +672,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -694,7 +699,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -737,19 +743,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,397 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel;
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ Relation pg_class;
+
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ pg_class = heap_open(parentOid, RowExclusiveLock);
+
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(pg_class, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(pg_class, RowExclusiveLock);
+ }
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query. To do this, inquire which xacts currently would
+ * conflict with AccessExclusiveLock on the table -- ie, which ones
+ * have a lock of any kind on the table. Then wait for each of these
+ * xacts to commit or abort. Note we do not need to worry about xacts
+ * that open the table for reading after this point; they will see the
+ * index as invalid when they open the relation.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need
+ * not check for that. Also, prepared xacts are not reported, which
+ * is fine since they certainly aren't going to do anything more.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ ListCell *lc;
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1325,7 +1726,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1407,17 +1807,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1445,63 +1836,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1514,13 +1850,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1942,6 +2272,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1955,7 +2287,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2070,7 +2403,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3188,7 +3522,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 94efd13..d72f0e9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,567 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationsConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given list of relation Oids. This list of
+ * indexes rebuilt is extracted from the list of relation Oids given in output
+ * that can be either relations or indexes.
+ * Each reindexing step is done simultaneously for all the indexes extracted.
+ */
+bool
+ReindexRelationsConcurrently(List *relationIds)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ foreach(lc, relationIds)
+ {
+ Oid relationOid = lfirst_oid(lc);
+
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds,
+ cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", bypassing",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ /*
+ * Build a unique list of parent relation Oids based on the extracted index
+ * list. This list of Oids is used to take session locks on the parent
+ * relations of indexes to prevent concurrent drop of relations involved by
+ * the concurrent reindex.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid parentOid = IndexGetRelation(lfirst_oid(lc), false);
+ parentRelationIds = list_append_unique_oid(parentRelationIds, parentOid);
+ }
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. We also
+ * need to wait until no running transactions could have the parent table
+ * of index open. A concurrent build is done for each concurrent
+ * index that will replace the old indexes.
+ */
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+ bool primary;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ LOCKTAG *heapLockTag;
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /* Finish the index invalidation and set it as dead */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1997,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +2024,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2143,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationsConcurrently(list_make1_oid(indOid));
return indOid;
}
@@ -1747,18 +2222,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexRelationsConcurrently(list_make1_oid(heapOid)))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2264,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2279,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2370,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationsConcurrently(list_make1_oid(relid));
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 51fdb63..a84a71c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3601,6 +3601,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 4b219b3..1418f50 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1840,6 +1840,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 76ef11e..6855ea5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6670,29 +6670,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..9f6a0f2 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,114 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction can have the table open with the index marked as
+ * read-only for updates.
+ * To do this, inquire which xacts currently would conflict with ShareLock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the table. Then wait for each of these xacts to commit or abort.
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index ad5e303..89b4c0d 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1255,15 +1255,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1275,8 +1279,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..335a620 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,24 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +105,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 7de6d5d..5f7c0d9 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationsConcurrently(List *relationIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 56cf592..5096fa4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2511,6 +2511,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..0b591ce 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..4053b53
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE reind_con_tab CONCURRENTLY; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
OK. I am back to this patch after a too long time.
Dito ;)
* would be nice (but thats probably a step #2 thing) to do the
individual steps of concurrent reindex over multiple relations to
avoid too much overall waiting for other transactions.I think I did that by now using one transaction per index for each
operation except the drop phase...Without yet having read the new version, I think thats not what I
meant. There currently is a wait for concurrent transactions to end
after most of the phases for every relation, right? If you have a busy
database with somewhat longrunning transactions thats going to slow
everything down with waiting quite bit. I wondered whether it would make
sense to do PHASE1 for all indexes in all relations, then wait once,
then PHASE2...That obviously has some space and index maintainece overhead issues, but
its probably sensible anyway in many cases.
OK, phase 1 is done with only one transaction for all the indexes. Do you
mean that we should do that with a single transaction for each index?
Yes.
Isn't the following block content thats mostly available somewhere else
already?
[... doc extract ...]Yes, this portion of the docs is pretty similar to what is findable in
CREATE INDEX CONCURRENTLY. Why not creating a new common documentation
section that CREATE INDEX CONCURRENTLY and REINDEX CONCURRENTLY could refer
to? I think we should first work on the code and then do the docs properly
though.
Agreed. I just noticed it when scrolling through the patch.
- if (concurrent && is_exclusion) + if (concurrent && is_exclusion && !is_reindex) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg_internal("concurrent indexcreation for exclusion constraints is not supported")));
This is what I referred to above wrt reindex and CONCURRENTLY. We
shouldn't pass concurrently if we don't deem it to be safe for exlusion
constraints.So does that mean that it is not possible to create an exclusive constraint
in a concurrent context?
Yes, its currently not safe in the general case.
Code path used by REINDEX concurrently permits to
create an index in parallel of an existing one and not a completely new
index. Shouldn't this work for indexes used by exclusion indexes also?
But that fact might safe things. I don't immediately see any reason that
adding a
if (!indisvalid)
return;
to check_exclusion_constraint wouldn't be sufficient if there's another
index with an equivalent definition.
+ /* + * Phase 2 of REINDEX CONCURRENTLY + * + * Build concurrent indexes in a separate transaction for eachindex to
+ * avoid having open transactions for an unnecessary long time.
We also
+ * need to wait until no running transactions could have the
parent table
+ * of index open. A concurrent build is done for each concurrent + * index that will replace the old indexes. + */ + + /* Get the first element of concurrent index list */ + lc2 = list_head(concurrentIndexIds); + + foreach(lc, indexIds) + { + Relation indexRel; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = lfirst_oid(lc2); + Oid relOid; + bool primary; + LOCKTAG *heapLockTag = NULL; + ListCell *cell; + + /* Move to next concurrent item */ + lc2 = lnext(lc2); + + /* Start new transaction for this index concurrent build */ + StartTransactionCommand(); + + /* Get the parent relation Oid */ + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, weneed to wait for
+ * locks on it. + */ + foreach(cell, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + + Assert(heapLockTag && heapLockTag->locktag_field2 !=InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
once for all relations after each phase? Otherwise the waiting time will
really start to hit when you do this on a somewhat busy server.Each new index is built and set as ready in a separate single transaction,
so doesn't it make sense to wait for the parent relation each time. It is
possible to wait for a parent relation only once during this phase but in
this case all the indexes of the same relation need to be set as ready in
the same transaction. So here the choice is either to wait for the same
relation multiple times for a single index or wait once for a parent
relation but we build all the concurrent indexes within the same
transaction. Choice 1 makes the code clearer and more robust to my mind as
the phase 2 is done clearly for each index separately. Thoughts?
As far as I understand that code its purpose is to enforce that all
potential users have an up2date definition available. For that we
acquire a lock on all virtualxids of users using that table thus waiting
for them to finish.
Consider the scenario where you have a workload where most transactions
are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
a_2, b_1, b_2). With the current strategy you will do:
WaitForVirtualLocks(a_1) -- wait up to 10min
index_build(a_1)
WaitForVirtualLocks(a_2) -- wait up to 10min
index_build(a_2)
...
So instead of waiting up 10 minutes for that phase you have to wait up
to 40.
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
I wonder we should directly open it exlusive here given its going to
opened exclusively in a bit anyway. Not that that will really reduce the
deadlock likelihood since we already hold the ShareUpdateExclusiveLock
in session mode ...I tried to use an AccessExclusiveLock here but it happens that this is not
compatible with index_set_state_flags. Does taking an exclusive lock
increments the transaction ID of running transaction? Because what I am
seeing is that taking AccessExclusiveLock on this index does a transaction
update.
Yep, it does when wal_level = hot_standby because it logs the exclusive
lock to wal so the startup process on the standby can acquire it.
Imo that Assert needs to be moved to the existing callsites if there
isn't an equivalent one already.
For those reasons current code sticks with ShareUpdateExclusiveLock. Not a
big deal btw...
Well, lock upgrades make deadlocks more likely.
Ok, of to v7:
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
...
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
..
+ /*
+ * If the index swapped is a toast index, take an exclusive lock
on its
+ * parent toast relation and then update reltoastidxid to the
new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ Relation pg_class;
+
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ pg_class = heap_open(parentOid, RowExclusiveLock);
+
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(pg_class, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(pg_class, RowExclusiveLock);
+ }
ISTM the RowExclusiveLock on the toast table should be acquired before
the locks on the indexes.
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query. To do this, inquire which xacts currently would
+ * conflict with AccessExclusiveLock on the table -- ie, which ones
+ * have a lock of any kind on the table. Then wait for each of these
+ * xacts to commit or abort. Note we do not need to worry about xacts
+ * that open the table for reading after this point; they will see the
+ * index as invalid when they open the relation.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need
+ * not check for that. Also, prepared xacts are not reported, which
+ * is fine since they certainly aren't going to do anything more.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
Most of that comment seems to belong to WaitForVirtualLocks instead of
this specific caller of WaitForVirtualLocks.
A comment in the header that it is doing the waiting would also be good.
In ReindexRelationsConcurrently I suggest s/bypassing/skipping/.
Btw, seing that we have an indisvalid check the toast table's index, do
we have any way to cleanup such a dead index? I don't think its allowed
to drop the index of a toast table. I.e. we possibly need to relax that
check for invalid indexes :/.
I think the usage of list_append_unique_oids in
ReindexRelationsConcurrently might get too expensive in larger
schemas. Its O(n^2) in the current usage and schemas with lots of
relations/indexes aren't unlikely candidates for this feature.
The easist solution probably is to use a hashtable.
ReindexRelationsConcurrently should do a CHECK_FOR_INTERRUPTS() every
once in a while, its currently not gracefully interruptible which
probably is bad in a bigger schema.
Thats all I have for now.
This patch is starting to look seriously cool and it seems realistic to
get into a ready state for 9.3.
I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund escribió:
I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.
REINDEX ALMOST CONCURRENTLY?
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 24/01/13 07:45, Alvaro Herrera wrote:
Andres Freund escribió:
I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.REINDEX ALMOST CONCURRENTLY?
REINDEX BEST EFFORT CONCURRENTLY?
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Andres Freund escribió:
I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.REINDEX ALMOST CONCURRENTLY?
I'm kind of unconvinced of the value proposition of this patch. I
mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
today, so ... how is this better?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Andres Freund escribi�:
I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.REINDEX ALMOST CONCURRENTLY?
I'm kind of unconvinced of the value proposition of this patch. I
mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
today, so ... how is this better?
This has been on the TODO list for a while, and I don't think the
renaming in a transaction work needed to use drop/create is really
something we want to force on users. In addition, doing that for all
tables in a database is even more work, so I would be disappointed _not_
to get this feature in 9.3.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ It's impossible for everything to be true. +
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Bruce Momjian <bruce@momjian.us> writes:
On Thu, Jan 24, 2013 at 01:29:56PM -0500, Robert Haas wrote:
I'm kind of unconvinced of the value proposition of this patch. I
mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
today, so ... how is this better?
This has been on the TODO list for a while, and I don't think the
renaming in a transaction work needed to use drop/create is really
something we want to force on users. In addition, doing that for all
tables in a database is even more work, so I would be disappointed _not_
to get this feature in 9.3.
I haven't given the current patch a look, but based on previous
discussions, this isn't going to be more than a macro for things that
users can do already --- that is, it's going to be basically DROP
CONCURRENTLY plus CREATE CONCURRENTLY plus ALTER INDEX RENAME, including
the fact that the RENAME step will transiently need an exclusive lock.
(If that's not what it's doing, it's broken.) So there's some
convenience argument for it, but it's hardly amounting to a stellar
improvement.
I'm kind of inclined to put it off till after we fix the SnapshotNow
race condition problems; at that point it should be possible to do
REINDEX CONCURRENTLY more simply and without any exclusive lock
anywhere.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-01-24 13:29:56 -0500, Robert Haas wrote:
On Wed, Jan 23, 2013 at 1:45 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Andres Freund escribió:
I somewhat dislike the fact that CONCURRENTLY isn't really concurrent
here (for the listeners: swapping the indexes acquires exlusive locks) ,
but I don't see any other naming being better.REINDEX ALMOST CONCURRENTLY?
I'm kind of unconvinced of the value proposition of this patch. I
mean, you can DROP INDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY
today, so ... how is this better?
In the wake of beb850e1d873f8920a78b9b9ee27e9f87c95592f I wrote a script
to do this and it really is harder than one might think:
* you cannot do it in the database as CONCURRENTLY cannot be used in a
TX
* you cannot do it to toast tables (this is currently broken in the
patch but should be fixable)
* you cannot legally do it when foreign key reference your unique key
* you cannot do it to exclusion constraints or non-immediate indexes
All of those are fixable (and most are) within REINDEX CONCURRENLY, so I
find that to be a major feature even if its not as good as it could be.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
All the comments are addressed in version 8 attached, except for the
hashtable part, which requires some heavy changes.
On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
Code path used by REINDEX concurrently permits to
create an index in parallel of an existing one and not a completely new
index. Shouldn't this work for indexes used by exclusion indexes also?But that fact might safe things. I don't immediately see any reason that
adding a
if (!indisvalid)
return;
to check_exclusion_constraint wouldn't be sufficient if there's another
index with an equivalent definition.
Indeed, this might be enough as for CREATE INDEX CONCURRENTLY this code
path cannot be taken and only indexes created concurrently can be invalid.
Hence I am adding that in the patch with a comment explaining why.
+ /* + * Phase 2 of REINDEX CONCURRENTLY + * + * Build concurrent indexes in a separate transaction for eachindex to
+ * avoid having open transactions for an unnecessary long time.
We also
+ * need to wait until no running transactions could have the
parent table
+ * of index open. A concurrent build is done for each
concurrent
+ * index that will replace the old indexes. + */ + + /* Get the first element of concurrent index list */ + lc2 = list_head(concurrentIndexIds); + + foreach(lc, indexIds) + { + Relation indexRel; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid =lfirst_oid(lc2);
+ Oid relOid; + bool primary; + LOCKTAG *heapLockTag = NULL; + ListCell *cell; + + /* Move to next concurrent item */ + lc2 = lnext(lc2); + + /* Start new transaction for this index concurrentbuild */
+ StartTransactionCommand(); + + /* Get the parent relation Oid */ + relOid = IndexGetRelation(indOid, false); + + /* + * Find the locktag of parent table for this index, weneed to wait for
+ * locks on it. + */ + foreach(cell, lockTags) + { + LOCKTAG *localTag = (LOCKTAG *) lfirst(cell); + if (relOid == localTag->locktag_field2) + heapLockTag = localTag; + } + + Assert(heapLockTag && heapLockTag->locktag_field2 !=InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
once for all relations after each phase? Otherwise the waiting timewill
really start to hit when you do this on a somewhat busy server.
Each new index is built and set as ready in a separate single
transaction,
so doesn't it make sense to wait for the parent relation each time. It is
possible to wait for a parent relation only once during this phase but in
this case all the indexes of the same relation need to be set as ready in
the same transaction. So here the choice is either to wait for the same
relation multiple times for a single index or wait once for a parent
relation but we build all the concurrent indexes within the same
transaction. Choice 1 makes the code clearer and more robust to my mindas
the phase 2 is done clearly for each index separately. Thoughts?
As far as I understand that code its purpose is to enforce that all
potential users have an up2date definition available. For that we
acquire a lock on all virtualxids of users using that table thus waiting
for them to finish.
Consider the scenario where you have a workload where most transactions
are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
a_2, b_1, b_2). With the current strategy you will do:WaitForVirtualLocks(a_1) -- wait up to 10min
index_build(a_1)
WaitForVirtualLocks(a_2) -- wait up to 10min
index_build(a_2)
...
So instead of waiting up 10 minutes for that phase you have to wait up
to 40.
This is necessary if you want to process each index entry in a different
transaction as WaitForVirtualLocks needs to wait for the locks held on the
parent table. If you want to fo this wait once per transaction, the
solution would be to group the index builds in the same transaction for all
the indexes of the relation. One index per transaction looks more solid in
this case if there is a failure during a process only one index will be
incorrectly built. Also, when you run a REINDEX CONCURRENTLY, you should
not need to worry about the time it takes. The point is that this operation
is done in background and that the tables are still accessible during this
time.
+ indexRel = index_open(indOid,
ShareUpdateExclusiveLock);
I wonder we should directly open it exlusive here given its going to
opened exclusively in a bit anyway. Not that that will really reducethe
deadlock likelihood since we already hold the ShareUpdateExclusiveLock
in session mode ...I tried to use an AccessExclusiveLock here but it happens that this is
not
compatible with index_set_state_flags. Does taking an exclusive lock
increments the transaction ID of running transaction? Because what I am
seeing is that taking AccessExclusiveLock on this index does atransaction
update.
Yep, it does when wal_level = hot_standby because it logs the exclusive
lock to wal so the startup process on the standby can acquire it.Imo that Assert needs to be moved to the existing callsites if there
isn't an equivalent one already.
OK. Letting the assertion inside index_set_state_flags makethes code more
consistent with CREATE INDEX CONCURRENTLY, so the existing behavior is fine.
For those reasons current code sticks with ShareUpdateExclusiveLock. Not
a
big deal btw...
Well, lock upgrades make deadlocks more likely.
Ok, of to v7: + */ +void +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid) ... + /* + * Take a lock on the old and new index before switching their names. This + * avoids having index swapping relying on relation renaming mechanism to + * get a lock on the relations involved. + */ + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock); + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock); .. + /* + * If the index swapped is a toast index, take an exclusive lock on its + * parent toast relation and then update reltoastidxid to the new index Oid + * value. + */ + if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE) + { + Relation pg_class; + + /* Open pg_class and fetch a writable copy of the relation tuple */ + pg_class = heap_open(parentOid, RowExclusiveLock); + + /* Update the statistics of this pg_class entry with new toast index Oid */ + index_update_stats(pg_class, false, false, newIndexOid, -1.0); + + /* Close parent relation */ + heap_close(pg_class, RowExclusiveLock); + }ISTM the RowExclusiveLock on the toast table should be acquired before
the locks on the indexes.
Done.
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag) +{ + Relation heapRelation; + Relation indexRelation; + + /* + * Now we must wait until no running transaction could be using the + * index for a query. To do this, inquire which xacts currently would + * conflict with AccessExclusiveLock on the table -- ie, which ones + * have a lock of any kind on the table. Then wait for each of these + * xacts to commit or abort. Note we do not need to worry about xacts + * that open the table for reading after this point; they will see the + * index as invalid when they open the relation. + * + * Note: the reason we use actual lock acquisition here, rather than + * just checking the ProcArray and sleeping, is that deadlock is + * possible if one of the transactions in question is blocked trying + * to acquire an exclusive lock on our table. The lock code will + * detect deadlock and error out properly. + * + * Note: GetLockConflicts() never reports our own xid, hence we need + * not check for that. Also, prepared xacts are not reported, which + * is fine since they certainly aren't going to do anything more. + */ + WaitForVirtualLocks(locktag, AccessExclusiveLock);Most of that comment seems to belong to WaitForVirtualLocks instead of
this specific caller of WaitForVirtualLocks.
Done.
A comment in the header that it is doing the waiting would also be good.
In ReindexRelationsConcurrently I suggest s/bypassing/skipping/.
Done.
Btw, seing that we have an indisvalid check the toast table's index, do
we have any way to cleanup such a dead index? I don't think its allowed
to drop the index of a toast table. I.e. we possibly need to relax that
check for invalid indexes :/.
For the time being, no I don't think so, except by doing a manual cleanup
and remove the invalid pg_class entry in catalogs. One way to do thath
cleanly could be to have autovacuum remove the invalid toast indexes
automatically, but it is not dedicated to that and this is another
discussion.
I think the usage of list_append_unique_oids in
ReindexRelationsConcurrently might get too expensive in larger
schemas. Its O(n^2) in the current usage and schemas with lots of
relations/indexes aren't unlikely candidates for this feature.
The easist solution probably is to use a hashtable.
Hum... This requires some thinking that will change the basics inside
ReindexRelationsConcurrently...
Let me play a bit with the hashtable APIs and I'll come back to that later.
ReindexRelationsConcurrently should do a CHECK_FOR_INTERRUPTS() every
once in a while, its currently not gracefully interruptible which
probably is bad in a bigger schema.
Done. I added some checks at each phase before beginning a new transaction.
--
Michael Paquier
http://michael.otacoo.com
Attachments:
20130125_reindex_concurrently_v8.patchapplication/octet-stream; name=20130125_reindex_concurrently_v8.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..b12e684 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,10 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will not perform a concurrent build if <literal>
+ CONCURRENTLY</> is not specified. To build the index without interfering
+ with production you should drop the index and reissue the <command>CREATE
+ INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</> command.
</para>
</listitem>
@@ -139,6 +140,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +247,103 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system indexes. System
+ indexes are rebuilt with a non-concurrent context.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +375,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..6c7179d 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2654,7 +2654,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..2e0798b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1084,7 +1094,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1096,6 +1106,394 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel, parentRel;
+
+ /*
+ * If the index swapped is a toast index, take a row exclusive lock on its
+ * parent toast relation before the involved indexes, it is necessary to
+ * take a lock before the indexes on the toast table as in this case
+ * the reltoastidxid is updated to the new index Oid.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ parentRel = heap_open(parentOid, RowExclusiveLock);
+ }
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(parentRel, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(parentRel, RowExclusiveLock);
+ }
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1326,7 +1724,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1408,17 +1805,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1446,63 +1834,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1515,13 +1848,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1943,6 +2270,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1956,7 +2285,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2071,7 +2401,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3189,7 +3520,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 94efd13..36d2e68 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,567 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationsConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given list of relation Oids. This list of
+ * indexes rebuilt is extracted from the list of relation Oids given in output
+ * that can be either relations or indexes.
+ * Each reindexing step is done simultaneously for all the indexes extracted.
+ */
+bool
+ReindexRelationsConcurrently(List *relationIds)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ foreach(lc, relationIds)
+ {
+ Oid relationOid = lfirst_oid(lc);
+
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds,
+ cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_append_unique_oid(indexIds, relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ /*
+ * Build a unique list of parent relation Oids based on the extracted index
+ * list. This list of Oids is used to take session locks on the parent
+ * relations of indexes to prevent concurrent drop of relations involved by
+ * the concurrent reindex.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid parentOid = IndexGetRelation(lfirst_oid(lc), false);
+ parentRelationIds = list_append_unique_oid(parentRelationIds, parentOid);
+ }
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. We also
+ * need to wait until no running transactions could have the parent table
+ * of index open. A concurrent build is done for each concurrent
+ * index that will replace the old indexes.
+ */
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+ bool primary;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ LOCKTAG *heapLockTag;
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(lc2, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc2);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /* Finish the index invalidation and set it as dead */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1997,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +2024,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2143,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationsConcurrently(list_make1_oid(indOid));
return indOid;
}
@@ -1747,18 +2222,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexRelationsConcurrently(list_make1_oid(heapOid)))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2264,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2279,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2370,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationsConcurrently(list_make1_oid(relid));
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 9206195..988ead5 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2da08d1..b9cd66b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9e313c8..c7a5345 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1841,6 +1841,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 828e110..be5dbc8 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6671,29 +6671,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..b5d8cc0 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,117 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction hold the relation related to lock this lock.
+ * To do this, inquire which xacts currently would conflict with this lock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the relation. Then wait for each of these xacts to commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..335a620 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,24 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +105,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..15e41f1 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationsConcurrently(List *relationIds);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d8678e5..e5377b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2521,6 +2521,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..0b591ce 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..4053b53
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE reind_con_tab CONCURRENTLY; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres@2ndquadrant.com>wrote:
I think the usage of list_append_unique_oids in
ReindexRelationsConcurrently might get too expensive in larger
schemas. Its O(n^2) in the current usage and schemas with lots of
relations/indexes aren't unlikely candidates for this feature.
The easist solution probably is to use a hashtable.
I just had a look at the hashtable APIs and I do not think it is adapted to
establish the list of unique index OIDs that need to be built concurrently.
It would be of a better use in case of mapping the indexOids with something
else, like the concurrent Oids, but still even with that the code would be
more readable if let as is.
--
Michael Paquier
http://michael.otacoo.com
On 2013-01-25 14:11:39 +0900, Michael Paquier wrote:
On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres@2ndquadrant.com>wrote:
I think the usage of list_append_unique_oids in
ReindexRelationsConcurrently might get too expensive in larger
schemas. Its O(n^2) in the current usage and schemas with lots of
relations/indexes aren't unlikely candidates for this feature.
The easist solution probably is to use a hashtable.I just had a look at the hashtable APIs and I do not think it is adapted to
establish the list of unique index OIDs that need to be built concurrently.
It would be of a better use in case of mapping the indexOids with something
else, like the concurrent Oids, but still even with that the code would be
more readable if let as is.
It sure isn't optimal, but it should do the trick if you use the
hash_seq stuff to iterate the hash afterwards. And you could use it to
map to the respective locks et al.
If you prefer other ways to implement it I guess the other easy solution
is to add the values without preventing duplicates and then sort &
remove duplicates in the end. Probably ends up being slightly more code,
but I am not sure.
I don't think we can leave the quadratic part in there as-is.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-01-25 13:48:50 +0900, Michael Paquier wrote:
All the comments are addressed in version 8 attached, except for the
hashtable part, which requires some heavy changes.On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
Code path used by REINDEX concurrently permits to
create an index in parallel of an existing one and not a completely new
index. Shouldn't this work for indexes used by exclusion indexes also?But that fact might safe things. I don't immediately see any reason that
adding a
if (!indisvalid)
return;
to check_exclusion_constraint wouldn't be sufficient if there's another
index with an equivalent definition.Indeed, this might be enough as for CREATE INDEX CONCURRENTLY this code
path cannot be taken and only indexes created concurrently can be invalid.
Hence I am adding that in the patch with a comment explaining why.
I don't really know anything about those mechanics, so some input from
somebody who does would be very much appreciated.
+ /* + * Phase 2 of REINDEX CONCURRENTLY + */ + + /* Get the first element of concurrent index list */ + lc2 = list_head(concurrentIndexIds); + + foreach(lc, indexIds) + { + WaitForVirtualLocks(*heapLockTag, ShareLock);Why do we have to do the WaitForVirtualLocks here? Shouldn't we do this
once for all relations after each phase? Otherwise the waiting time will
really start to hit when you do this on a somewhat busy server.Each new index is built and set as ready in a separate single
transaction,
so doesn't it make sense to wait for the parent relation each time. It is
possible to wait for a parent relation only once during this phase but in
this case all the indexes of the same relation need to be set as ready in
the same transaction. So here the choice is either to wait for the same
relation multiple times for a single index or wait once for a parent
relation but we build all the concurrent indexes within the same
transaction. Choice 1 makes the code clearer and more robust to my mindas
the phase 2 is done clearly for each index separately. Thoughts?
As far as I understand that code its purpose is to enforce that all
potential users have an up2date definition available. For that we
acquire a lock on all virtualxids of users using that table thus waiting
for them to finish.
Consider the scenario where you have a workload where most transactions
are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
a_2, b_1, b_2). With the current strategy you will do:WaitForVirtualLocks(a_1) -- wait up to 10min
index_build(a_1)
WaitForVirtualLocks(a_2) -- wait up to 10min
index_build(a_2)...
So instead of waiting up 10 minutes for that phase you have to wait up
to 40.This is necessary if you want to process each index entry in a different
transaction as WaitForVirtualLocks needs to wait for the locks held on the
parent table. If you want to fo this wait once per transaction, the
solution would be to group the index builds in the same transaction for all
the indexes of the relation. One index per transaction looks more solid in
this case if there is a failure during a process only one index will be
incorrectly built.
I cannot really follow you here. The reason why we need to wait here is
*only* to make sure that nobody still has the old list of indexes
around (which probably could even be relaxed for reindex concurrently,
but thats a separate optimization).
So if we wait for all relevant transactions to end before starting phase
2 proper, we are fine, independent of how many indexes we build in a
single transaction.
Also, when you run a REINDEX CONCURRENTLY, you should
not need to worry about the time it takes. The point is that this operation
is done in background and that the tables are still accessible during this
time.
I don't think that arguments holds that much water. Having open
transactions for too long *does* incur a rather noticeable overhead. And
you definitely do want such operations to finish as quickly as possible,
even if its just because you can go home only afterwards ;)
Really, imagine doing this too 100 indexes on a system where
transactions regularly take 30 minutes (only needs one at a time). Minus
the actual build-time thats very approx 4h against like half a month.
Btw, seing that we have an indisvalid check the toast table's index, do
we have any way to cleanup such a dead index? I don't think its allowed
to drop the index of a toast table. I.e. we possibly need to relax that
check for invalid indexes :/.For the time being, no I don't think so, except by doing a manual cleanup
and remove the invalid pg_class entry in catalogs. One way to do thath
cleanly could be to have autovacuum remove the invalid toast indexes
automatically, but it is not dedicated to that and this is another
discussion.
Hm. Don't think thats acceptable :/
As I mentioned somewhere else, I don't see how to do an concurrent build
of the toast index at all, given there is exactly one index hardcoded in
tuptoaster.c so the second index won't get updated before the switch has
been made.
Haven't yet looked at the new patch - do you plan to provide an updated
version addressing some of the remaining issues soon? Don't want to
review this if you nearly have the next version available.
Greetings,
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jan 27, 2013 at 1:37 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-25 14:11:39 +0900, Michael Paquier wrote:
It sure isn't optimal, but it should do the trick if you use the
hash_seq stuff to iterate the hash afterwards. And you could use it to
map to the respective locks et al.If you prefer other ways to implement it I guess the other easy solution
is to add the values without preventing duplicates and then sort &
remove duplicates in the end. Probably ends up being slightly more code,
but I am not sure.
Indeed, I began playing with the HTAB functions and it looks that the only
correct way to use that would be to use a hash table using as key the index
OID with as entry:
- the index OID itself
- the concurrent OID
And a second hash table with parent relation OID as key and as output the
LOCKTAG for each parent relation.
I don't think we can leave the quadratic part in there as-is.
Sure, that is understandable.
--
Michael Paquier
http://michael.otacoo.com
On Sun, Jan 27, 2013 at 1:52 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-25 13:48:50 +0900, Michael Paquier wrote:
All the comments are addressed in version 8 attached, except for the
hashtable part, which requires some heavy changes.On Thu, Jan 24, 2013 at 3:41 AM, Andres Freund <andres@2ndquadrant.com
wrote:On 2013-01-15 18:16:59 +0900, Michael Paquier wrote:
Code path used by REINDEX concurrently permits to
create an index in parallel of an existing one and not a completelynew
index. Shouldn't this work for indexes used by exclusion indexes
also?
But that fact might safe things. I don't immediately see any reason
that
adding a
if (!indisvalid)
return;
to check_exclusion_constraint wouldn't be sufficient if there's another
index with an equivalent definition.Indeed, this might be enough as for CREATE INDEX CONCURRENTLY this code
path cannot be taken and only indexes created concurrently can beinvalid.
Hence I am adding that in the patch with a comment explaining why.
I don't really know anything about those mechanics, so some input from
somebody who does would be very much appreciated.+ /* + * Phase 2 of REINDEX CONCURRENTLY + */ + + /* Get the first element of concurrent index list */ + lc2 = list_head(concurrentIndexIds); + + foreach(lc, indexIds) + { + WaitForVirtualLocks(*heapLockTag, ShareLock);Why do we have to do the WaitForVirtualLocks here? Shouldn't we do
this
once for all relations after each phase? Otherwise the waiting
time will
really start to hit when you do this on a somewhat busy server.
Each new index is built and set as ready in a separate single
transaction,
so doesn't it make sense to wait for the parent relation each time.
It is
possible to wait for a parent relation only once during this phase
but in
this case all the indexes of the same relation need to be set as
ready in
the same transaction. So here the choice is either to wait for the
same
relation multiple times for a single index or wait once for a parent
relation but we build all the concurrent indexes within the same
transaction. Choice 1 makes the code clearer and more robust to mymind
as
the phase 2 is done clearly for each index separately. Thoughts?
As far as I understand that code its purpose is to enforce that all
potential users have an up2date definition available. For that we
acquire a lock on all virtualxids of users using that table thuswaiting
for them to finish.
Consider the scenario where you have a workload where most transactions
are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
a_2, b_1, b_2). With the current strategy you will do:WaitForVirtualLocks(a_1) -- wait up to 10min
index_build(a_1)
WaitForVirtualLocks(a_2) -- wait up to 10min
index_build(a_2)...
So instead of waiting up 10 minutes for that phase you have to wait up
to 40.This is necessary if you want to process each index entry in a different
transaction as WaitForVirtualLocks needs to wait for the locks held onthe
parent table. If you want to fo this wait once per transaction, the
solution would be to group the index builds in the same transaction forall
the indexes of the relation. One index per transaction looks more solid
in
this case if there is a failure during a process only one index will be
incorrectly built.I cannot really follow you here.
OK let's be more explicit...
The reason why we need to wait here is
*only* to make sure that nobody still has the old list of indexes
around (which probably could even be relaxed for reindex concurrently,
but thats a separate optimization).
In order to do that, you need to wait for the *parent relations* and not
the index themselves, no?
Based on 2 facts:
- each index build is done in a single transaction
- a wait needs to be done on the parent relation before each transaction
You need to wait for the parent relation multiple times depending on the
number of indexes in it. You could optimize that by building all the
indexes of the *same parent relation* in a single transaction.
So, for example in the case of this table:
CREATE TABLE tab (col1 PRIMARY KEY, col2 int);
CREATE INDEX int ON tab (col2);
If the primary key index and the second index on col2 are built in a unique
transaction, you could wait only once for the locks on the parent relation
'tab' only once.
So if we wait for all relevant transactions to end before starting phase
2 proper, we are fine, independent of how many indexes we build in a
single transaction.
The reason why all the index builds are done in a single transaction is
that you mentioned in a previous review (v3?) that we should do the builds
in a single transaction for *each* index. What looked fair based on the
fact that the transaction time for each index could be reduced, the
downside being that you wait more on the parent relation.
Btw, seing that we have an indisvalid check the toast table's index, do
we have any way to cleanup such a dead index? I don't think its allowed
to drop the index of a toast table. I.e. we possibly need to relax that
check for invalid indexes :/.For the time being, no I don't think so, except by doing a manual cleanup
and remove the invalid pg_class entry in catalogs. One way to do thath
cleanly could be to have autovacuum remove the invalid toast indexes
automatically, but it is not dedicated to that and this is another
discussion.Hm. Don't think thats acceptable :/
As I mentioned somewhere else, I don't see how to do an concurrent build
of the toast index at all, given there is exactly one index hardcoded in
tuptoaster.c so the second index won't get updated before the switch has
been made.Haven't yet looked at the new patch - do you plan to provide an updated
version addressing some of the remaining issues soon? Don't want to
review this if you nearly have the next version available.
Before providing more effort in coding, I think it is better to be clear
about the strategy to use on the 2 following points:
1) At the index build phase, is it better to build each index in a single
separate transaction? Or group the builds in a transaction for each parent
table? This is solvable but the strategy should be clear.
2) Find a solution for invalid toast indexes, which is not that easy. One
solution could be to use an autovacuum process to clean up the invalid
indexes of toast tables automatically. Another solution is to skip the
reindex for toast indexes, making the feature less usable.
If a solution or an agreement is not found for those 2 points, I think it
will be fair to simply reject the patch.
It looks that this feature has still too many disadvantages compared to the
advantages it could bring in the current infrastructure (SnapshotNow
problems, what to do with invalid toast indexes, etc.), so I would tend to
agree with Tom and postpone this feature once infrastructure is more
mature, one of the main things being the non-MVCC'ed catalogs.
--
Michael Paquier
http://michael.otacoo.com
On 2013-01-27 07:54:43 +0900, Michael Paquier wrote:
On Sun, Jan 27, 2013 at 1:52 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-25 13:48:50 +0900, Michael Paquier wrote:
As far as I understand that code its purpose is to enforce that all
potential users have an up2date definition available. For that we
acquire a lock on all virtualxids of users using that table thus waiting
for them to finish.
Consider the scenario where you have a workload where most transactions
are fairly long (say 10min) and use the same tables (a,b)/indexes(a_1,
a_2, b_1, b_2). With the current strategy you will do:WaitForVirtualLocks(a_1) -- wait up to 10min
index_build(a_1)
WaitForVirtualLocks(a_2) -- wait up to 10min
index_build(a_2)...
So instead of waiting up 10 minutes for that phase you have to wait up
to 40.This is necessary if you want to process each index entry in a different
transaction as WaitForVirtualLocks needs to wait for the locks held on the
parent table. If you want to fo this wait once per transaction, the
solution would be to group the index builds in the same transaction for all
the indexes of the relation. One index per transaction looks more solid in
this case if there is a failure during a process only one index will be
incorrectly built.I cannot really follow you here.
OK let's be more explicit...
The reason why we need to wait here is
*only* to make sure that nobody still has the old list of indexes
around (which probably could even be relaxed for reindex concurrently,
but thats a separate optimization).In order to do that, you need to wait for the *parent relations* and not
the index themselves, no?
Based on 2 facts:
- each index build is done in a single transaction
- a wait needs to be done on the parent relation before each transaction
You need to wait for the parent relation multiple times depending on the
number of indexes in it. You could optimize that by building all the
indexes of the *same parent relation* in a single transaction.
I think youre misunderstanding how this part works a bit. We don't
acquire locks on the table itself, but we get a list of all transactions
we would conflict with if we were to acquire a lock of a certain
strength on the table (GetLockConflicts(locktag, mode)). We then wait
for each transaction in the resulting list via the VirtualXact mechanism
(VirtualXactLock(*lockholder)).
It doesn't matter all that waiting happens in the same transaction the
initial index build is done in as long as we keep the session locks
preventing other schema modifications. Nobody can go back and see an
older index list after we've done the above wait once.
So the following should be perfectly fine:
StartTransactionCommand();
BuildListOfIndexes();
foreach(index in indexes)
DefineNewIndex(index);
CommitTransactionCommand();
StartTransactionCommand();
foreach(table in tables)
GetLockConflicts()
foreach(conflict in conflicts)
VirtualXactLocks()
CommitTransactionCommand();
foreach(index in indexes)
StartTransactionCommand();
InitialIndexBuild(index)
CommitTransactionCommand();
...
It looks that this feature has still too many disadvantages compared to the
advantages it could bring in the current infrastructure (SnapshotNow
problems, what to do with invalid toast indexes, etc.), so I would tend to
agree with Tom and postpone this feature once infrastructure is more
mature, one of the main things being the non-MVCC'ed catalogs.
I think while catalog mvcc snapshots would make this easier, most
problems, basically all but the switching of relations, are pretty much
independent from that fact. All the waiting etc, will still be there.
I can see an argument for pushing it to the next CF because its not
really there yet...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 28, 2013 at 7:39 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-27 07:54:43 +0900, Michael Paquier wrote:
I think you're misunderstanding how this part works a bit. We don't
acquire locks on the table itself, but we get a list of all transactions
we would conflict with if we were to acquire a lock of a certain
strength on the table (GetLockConflicts(locktag, mode)). We then wait
for each transaction in the resulting list via the VirtualXact mechanism
(VirtualXactLock(*lockholder)).
It doesn't matter all that waiting happens in the same transaction the
initial index build is done in as long as we keep the session locks
preventing other schema modifications. Nobody can go back and see an
older index list after we've done the above wait once.
Don't worry I got it. I just thought that it was necessary to wait for the
locks taken on the parent relation by other backends just *before* building
the index. It seemed more stable.
So the following should be perfectly fine:
StartTransactionCommand();
BuildListOfIndexes();
foreach(index in indexes)
DefineNewIndex(index);
CommitTransactionCommand();StartTransactionCommand();
foreach(table in tables)
GetLockConflicts()
foreach(conflict in conflicts)
VirtualXactLocks()
CommitTransactionCommand();foreach(index in indexes)
StartTransactionCommand();
InitialIndexBuild(index)
CommitTransactionCommand();
So you're point is simply to wait for all the locks currently taken on each
table in a different transaction only once and for all, independently from
the build and validation phases. Correct?
It looks that this feature has still too many disadvantages compared to
theadvantages it could bring in the current infrastructure (SnapshotNow
problems, what to do with invalid toast indexes, etc.), so I would tendto
agree with Tom and postpone this feature once infrastructure is more
mature, one of the main things being the non-MVCC'ed catalogs.I think while catalog mvcc snapshots would make this easier, most
problems, basically all but the switching of relations, are pretty much
independent from that fact. All the waiting etc, will still be there.I can see an argument for pushing it to the next CF because its not
really there yet...
Even if we get this patch in a shape that you think is sufficient to make
it reviewable by a committer within a couple of days, there are still many
doubts from many people regarding this feature, so this is going to take
far more time to put it in a shape that would satisfy a vast majority. So
it is honestly wiser to work on that later.
Another argument that would be enough for a rejection of this patch by a
committer is the problem of invalid toast indexes that cannot be removed up
cleanly by an operator. As long as there is not a clean solution for that...
--
Michael Paquier
http://michael.otacoo.com
Hi,
On 2013-01-28 20:31:48 +0900, Michael Paquier wrote:
On Mon, Jan 28, 2013 at 7:39 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-27 07:54:43 +0900, Michael Paquier wrote:
I think you're misunderstanding how this part works a bit. We don't
acquire locks on the table itself, but we get a list of all transactions
we would conflict with if we were to acquire a lock of a certain
strength on the table (GetLockConflicts(locktag, mode)). We then wait
for each transaction in the resulting list via the VirtualXact mechanism
(VirtualXactLock(*lockholder)).
It doesn't matter all that waiting happens in the same transaction the
initial index build is done in as long as we keep the session locks
preventing other schema modifications. Nobody can go back and see an
older index list after we've done the above wait once.Don't worry I got it. I just thought that it was necessary to wait for the
locks taken on the parent relation by other backends just *before* building
the index. It seemed more stable.
I don't see any need for that. Its really only about making sure their
relcache entry for the indexlist - and by extension rd_indexattr - in
all other transactions that could possibly write to the table is
up2date.
As a relation_open with a lock (which is done for every write) will
always drain the invalidations thats guaranteed if we wait that way.
So the following should be perfectly fine:
StartTransactionCommand();
BuildListOfIndexes();
foreach(index in indexes)
DefineNewIndex(index);
CommitTransactionCommand();StartTransactionCommand();
foreach(table in tables)
GetLockConflicts()
foreach(conflict in conflicts)
VirtualXactLocks()
CommitTransactionCommand();foreach(index in indexes)
StartTransactionCommand();
InitialIndexBuild(index)
CommitTransactionCommand();So you're point is simply to wait for all the locks currently taken on each
table in a different transaction only once and for all, independently from
the build and validation phases. Correct?
Exactly. That will batch the wait for the transactions together and thus
will greatly decrease the overhead of doing a concurrent reindex
(wall, not cpu-clock wise).
It looks that this feature has still too many disadvantages compared to the
advantages it could bring in the current infrastructure (SnapshotNow
problems, what to do with invalid toast indexes, etc.), so I would tend to
agree with Tom and postpone this feature once infrastructure is more
mature, one of the main things being the non-MVCC'ed catalogs.I think while catalog mvcc snapshots would make this easier, most
problems, basically all but the switching of relations, are pretty much
independent from that fact. All the waiting etc, will still be there.I can see an argument for pushing it to the next CF because its not
really there yet...Even if we get this patch in a shape that you think is sufficient to make
it reviewable by a committer within a couple of days, there are still many
doubts from many people regarding this feature, so this is going to take
far more time to put it in a shape that would satisfy a vast majority. So
it is honestly wiser to work on that later.
I really haven't heard too many arguments from other after the initial
round.
Right now I "only" recall Tom and Robert doubting the usefulness, right?
I think most of the work in this patch is completely independent from
the snapshot stuff, so I really don't see much of an argument to make it
dependent on catalog snapshots.
Another argument that would be enough for a rejection of this patch by a
committer is the problem of invalid toast indexes that cannot be removed up
cleanly by an operator. As long as there is not a clean solution for
that...
I think that part is relatively easy to fix, I wouldn't worry too
much.
The more complex part is how to get tuptoaster.c to update the
concurrently created index. Thats what I worry about. Its not going
through the normal executor paths but manually updates the toast
index - which means it won't update the indisready && !indisvalid
index...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 28, 2013 at 8:44 PM, Andres Freund <andres@anarazel.de> wrote:
Another argument that would be enough for a rejection of this patch by a
committer is the problem of invalid toast indexes that cannot be removedup
cleanly by an operator. As long as there is not a clean solution for
that...I think that part is relatively easy to fix, I wouldn't worry too
much.
The more complex part is how to get tuptoaster.c to update the
concurrently created index. That's what I worry about. Its not going
through the normal executor paths but manually updates the toast
index - which means it won't update the indisready && !indisvalid
index...
I included in the patch some stuff to update the reltoastidxid of the
parent relation of the toast index. Have a look at
index.c:index_concurrent_swap. The particular case I had in mind was if
there is a failure of the server during the concurrent reindex of a toast
index. When server restarts, the toast relation will have an invalid index
and this cannot be dropped by an operator via SQL.
--
Michael Paquier
http://michael.otacoo.com
On 2013-01-28 20:50:21 +0900, Michael Paquier wrote:
On Mon, Jan 28, 2013 at 8:44 PM, Andres Freund <andres@anarazel.de> wrote:
Another argument that would be enough for a rejection of this patch by a
committer is the problem of invalid toast indexes that cannot be removedup
cleanly by an operator. As long as there is not a clean solution for
that...I think that part is relatively easy to fix, I wouldn't worry too
much.
The more complex part is how to get tuptoaster.c to update the
concurrently created index. That's what I worry about. Its not going
through the normal executor paths but manually updates the toast
index - which means it won't update the indisready && !indisvalid
index...I included in the patch some stuff to update the reltoastidxid of the
parent relation of the toast index. Have a look at
index.c:index_concurrent_swap. The particular case I had in mind was if
there is a failure of the server during the concurrent reindex of a toast
index.
Thats not enough unfortunately. The problem scenario is the following:
toast table: pg_toast.pg_toast_16384
toast index (via reltoastidxid): pg_toast.pg_toast_16384_index
REINDEX CONCURRENTLY PHASE #1
REINDEX CONCURRENTLY PHASE #2
toast table: pg_toast.pg_toast_16384
toast index (via reltoastidxid): pg_toast.pg_toast_16384_index, ready & valid
toast index (via pg_index): pg_toast.pg_toast_16384_index_tmp, ready & !valid
If a tuple gets toasted in this state tuptoaster.c will update
16384_index but not 16384_index_tmp. In normal tables this works because
nodeModifyTable uses ExecInsertIndexTuples which updates all ready
indexes. tuptoaster.c does something different though, it calls
index_insert exactly on the one expected index, not on the other ones.
Makes sense?
When server restarts, the toast relation will have an invalid index
and this cannot be dropped by an operator via SQL.
That requires about two lines of special case code in
RangeVarCallbackForDropRelation, that doesn't seem to be too bad to me.
I.e. allow the case where its IsSystemClass(classform) && relkind ==
RELKIND_INDEX && !indisvalid.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 28, 2013 at 8:59 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-01-28 20:50:21 +0900, Michael Paquier wrote:
On Mon, Jan 28, 2013 at 8:44 PM, Andres Freund <andres@anarazel.de>
wrote:
Another argument that would be enough for a rejection of this patch
by a
committer is the problem of invalid toast indexes that cannot be
removed
up
cleanly by an operator. As long as there is not a clean solution for
that...I think that part is relatively easy to fix, I wouldn't worry too
much.
The more complex part is how to get tuptoaster.c to update the
concurrently created index. That's what I worry about. Its not going
through the normal executor paths but manually updates the toast
index - which means it won't update the indisready && !indisvalid
index...I included in the patch some stuff to update the reltoastidxid of the
parent relation of the toast index. Have a look at
index.c:index_concurrent_swap. The particular case I had in mind was if
there is a failure of the server during the concurrent reindex of a toast
index.Thats not enough unfortunately. The problem scenario is the following:
toast table: pg_toast.pg_toast_16384
toast index (via reltoastidxid): pg_toast.pg_toast_16384_index
REINDEX CONCURRENTLY PHASE #1
REINDEX CONCURRENTLY PHASE #2
toast table: pg_toast.pg_toast_16384
toast index (via reltoastidxid): pg_toast.pg_toast_16384_index, ready &
valid
toast index (via pg_index): pg_toast.pg_toast_16384_index_tmp, ready &
!validIf a tuple gets toasted in this state tuptoaster.c will update
16384_index but not 16384_index_tmp. In normal tables this works because
nodeModifyTable uses ExecInsertIndexTuples which updates all ready
indexes. tuptoaster.c does something different though, it calls
index_insert exactly on the one expected index, not on the other ones.Makes sense?
I didn't know toast indexes followed this code path. Thanks for the
details.
When server restarts, the toast relation will have an invalid index
and this cannot be dropped by an operator via SQL.That requires about two lines of special case code in
RangeVarCallbackForDropRelation, that doesn't seem to be too bad to me.I.e. allow the case where its IsSystemClass(classform) && relkind ==
RELKIND_INDEX && !indisvalid.
OK, I thought it was more complicated.
--
Michael Paquier
http://michael.otacoo.com
Hi,
Please find attached a patch fixing 3 of the 4 problems reported before
(the patch does not contain docs).
1) Removal of the quadratic dependency with list_append_unique_oid
2) Minimization of the wait phase for parent relations, this is done in a
single transaction before phase 2
3) Authorization of the drop for invalid system indexes
The problem remaining is related to toast indexes. In current master code,
tuptoastter.c assumes that the index attached to the toast relation is
unique
This creates a problem when running concurrent reindex on toast indexes,
because after phase 2, there is this problem:
pg_toast_index valid && ready
pg_toast_index_cct valid && !ready
The concurrent toast index went though index_build is set as valid. So at
this instant, the index can be used when inserting new entries.
However, when inserting a new entry in the toast index, only the index
registered in reltoastidxid is used for insertion in
tuptoaster.c:toast_save_datum.
toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
This cannot work when there are concurrent toast indexes as in this case
the toast index is thought as unique.
In order to fix that, it is necessary to extend toast_save_datum to insert
index data to the other concurrent indexes as well, and I am currently
thinking about two possible approaches:
1) Change reltoastidxid from oid type to oidvector to be able to manage
multiple toast index inserts. The concurrent indexes would be added in this
vector once built and all the indexes in this vector would be used by
tuptoaster.c:toast_save_datum. Not backward compatible but does it matter
for toast relations?
2) Add new oidvector column in pg_class containing a vector of concurrent
toast index Oids built but not validated. toast_save_datum would scan this
vector and insert entries in index if there are any present in vector.
Comments as well as other ideas are welcome.
Thanks,
--
Michael
Attachments:
20130107_reindex_concurrently_v9.patchapplication/octet-stream; name=20130107_reindex_concurrently_v9.patchDownload
diff --git a/src/backend/access/transam/rmgr.c b/src/backend/access/transam/rmgr.c
index cc210a7..41d4379 100644
--- a/src/backend/access/transam/rmgr.c
+++ b/src/backend/access/transam/rmgr.c
@@ -24,23 +24,10 @@
#include "storage/standby.h"
#include "utils/relmapper.h"
+/* must be kept in sync with RmgrData definition in xlog_internal.h */
+#define PG_RMGR(symname,name,redo,desc,startup,cleanup,restartpoint) \
+ { name, redo, desc, startup, cleanup, restartpoint },
const RmgrData RmgrTable[RM_MAX_ID + 1] = {
- {"XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL},
- {"Transaction", xact_redo, xact_desc, NULL, NULL, NULL},
- {"Storage", smgr_redo, smgr_desc, NULL, NULL, NULL},
- {"CLOG", clog_redo, clog_desc, NULL, NULL, NULL},
- {"Database", dbase_redo, dbase_desc, NULL, NULL, NULL},
- {"Tablespace", tblspc_redo, tblspc_desc, NULL, NULL, NULL},
- {"MultiXact", multixact_redo, multixact_desc, NULL, NULL, NULL},
- {"RelMap", relmap_redo, relmap_desc, NULL, NULL, NULL},
- {"Standby", standby_redo, standby_desc, NULL, NULL, NULL},
- {"Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL},
- {"Heap", heap_redo, heap_desc, NULL, NULL, NULL},
- {"Btree", btree_redo, btree_desc, btree_xlog_startup, btree_xlog_cleanup, btree_safe_restartpoint},
- {"Hash", hash_redo, hash_desc, NULL, NULL, NULL},
- {"Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, gin_safe_restartpoint},
- {"Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL},
- {"Sequence", seq_redo, seq_desc, NULL, NULL, NULL},
- {"SPGist", spg_redo, spg_desc, spg_xlog_startup, spg_xlog_cleanup, NULL}
+#include "access/rmgrlist.h"
};
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..6c7179d 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2654,7 +2654,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..448d2ba 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1084,7 +1094,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1096,6 +1106,395 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel, parentRel;
+
+ /*
+ * If the index swapped is a toast index, take a row exclusive lock on its
+ * parent toast relation before the involved indexes, it is necessary to
+ * take a lock before the indexes on the toast table as in this case
+ * the reltoastidxid is updated to the new index Oid.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ parentRel = heap_open(parentOid, RowExclusiveLock);
+ }
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(parentRel, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(parentRel, RowExclusiveLock);
+ }
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1326,7 +1725,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1408,17 +1806,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1446,63 +1835,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1515,13 +1849,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1943,6 +2271,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1956,7 +2286,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2071,7 +2402,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3189,7 +3521,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3385a1..493a085 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,526 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on each session lock separate transaction */
+ StartTransactionCommand();
+ foreach(lc, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc);
+ Assert(localTag && localTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*localTag, ShareLock);
+ }
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1956,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1983,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2102,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1747,18 +2181,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2223,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2238,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2329,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eeddd9a..36bd576 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -891,6 +891,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..1890766 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2da08d1..b9cd66b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9e313c8..c7a5345 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1841,6 +1841,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 342b796..60a6c96 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6672,29 +6672,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..b5d8cc0 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,117 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction hold the relation related to lock this lock.
+ * To do this, inquire which xacts currently would conflict with this lock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the relation. Then wait for each of these xacts to commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/access/rmgr.h b/src/include/access/rmgr.h
index e4844fe..f6d2776 100644
--- a/src/include/access/rmgr.h
+++ b/src/include/access/rmgr.h
@@ -13,27 +13,23 @@ typedef uint8 RmgrId;
/*
* Built-in resource managers
*
- * Note: RM_MAX_ID could be as much as 255 without breaking the XLOG file
- * format, but we keep it small to minimize the size of RmgrTable[].
+ * The actual numerical values for each rmgr ID are defined by the order
+ * of entries in rmgrlist.h.
+ *
+ * Note: RM_MAX_ID must fit in RmgrId; widening that type will affect the XLOG
+ * file format.
*/
-#define RM_XLOG_ID 0
-#define RM_XACT_ID 1
-#define RM_SMGR_ID 2
-#define RM_CLOG_ID 3
-#define RM_DBASE_ID 4
-#define RM_TBLSPC_ID 5
-#define RM_MULTIXACT_ID 6
-#define RM_RELMAP_ID 7
-#define RM_STANDBY_ID 8
-#define RM_HEAP2_ID 9
-#define RM_HEAP_ID 10
-#define RM_BTREE_ID 11
-#define RM_HASH_ID 12
-#define RM_GIN_ID 13
-#define RM_GIST_ID 14
-#define RM_SEQ_ID 15
-#define RM_SPGIST_ID 16
+#define PG_RMGR(symname,name,redo,desc,startup,cleanup,restartpoint) \
+ symname,
+
+typedef enum RmgrIds
+{
+#include "access/rmgrlist.h"
+ RM_NEXT_ID
+} RmgrIds;
+
+#undef PG_RMGR
-#define RM_MAX_ID RM_SPGIST_ID
+#define RM_MAX_ID (RM_NEXT_ID - 1)
#endif /* RMGR_H */
diff --git a/src/include/access/rmgrlist.h b/src/include/access/rmgrlist.h
new file mode 100644
index 0000000..7ad71b3
--- /dev/null
+++ b/src/include/access/rmgrlist.h
@@ -0,0 +1,44 @@
+/*---------------------------------------------------------------------------
+ * rmgrlist.h
+ *
+ * The resource manager list is kept in its own source file for possible
+ * use by automatic tools. The exact representation of a rmgr is determined
+ * by the PG_RMGR macro, which is not defined in this file; it can be
+ * defined by the caller for special purposes.
+ *
+ * Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ * src/include/access/rmgrlist.h
+ *---------------------------------------------------------------------------
+ */
+
+/* there is deliberately not an #ifndef RMGRLIST_H here */
+
+/*
+ * List of resource manager entries. Note that order of entries defines the
+ * numerical values of each rmgr's ID, which is stored in WAL records. New
+ * entries should be added at the end, to avoid changing IDs of existing
+ * entries.
+ *
+ * Changes to this list possibly need a XLOG_PAGE_MAGIC bump.
+ */
+
+/* symbol name, textual name, redo, desc, startup, cleanup, restartpoint */
+PG_RMGR(RM_XLOG_ID, "XLOG", xlog_redo, xlog_desc, NULL, NULL, NULL)
+PG_RMGR(RM_XACT_ID, "Transaction", xact_redo, xact_desc, NULL, NULL, NULL)
+PG_RMGR(RM_SMGR_ID, "Storage", smgr_redo, smgr_desc, NULL, NULL, NULL)
+PG_RMGR(RM_CLOG_ID, "CLOG", clog_redo, clog_desc, NULL, NULL, NULL)
+PG_RMGR(RM_DBASE_ID, "Database", dbase_redo, dbase_desc, NULL, NULL, NULL)
+PG_RMGR(RM_TBLSPC_ID, "Tablespace", tblspc_redo, tblspc_desc, NULL, NULL, NULL)
+PG_RMGR(RM_MULTIXACT_ID, "MultiXact", multixact_redo, multixact_desc, NULL, NULL, NULL)
+PG_RMGR(RM_RELMAP_ID, "RelMap", relmap_redo, relmap_desc, NULL, NULL, NULL)
+PG_RMGR(RM_STANDBY_ID, "Standby", standby_redo, standby_desc, NULL, NULL, NULL)
+PG_RMGR(RM_HEAP2_ID, "Heap2", heap2_redo, heap2_desc, NULL, NULL, NULL)
+PG_RMGR(RM_HEAP_ID, "Heap", heap_redo, heap_desc, NULL, NULL, NULL)
+PG_RMGR(RM_BTREE_ID, "Btree", btree_redo, btree_desc, btree_xlog_startup, btree_xlog_cleanup, btree_safe_restartpoint)
+PG_RMGR(RM_HASH_ID, "Hash", hash_redo, hash_desc, NULL, NULL, NULL)
+PG_RMGR(RM_GIN_ID, "Gin", gin_redo, gin_desc, gin_xlog_startup, gin_xlog_cleanup, gin_safe_restartpoint)
+PG_RMGR(RM_GIST_ID, "Gist", gist_redo, gist_desc, gist_xlog_startup, gist_xlog_cleanup, NULL)
+PG_RMGR(RM_SEQ_ID, "Sequence", seq_redo, seq_desc, NULL, NULL, NULL)
+PG_RMGR(RM_SPGIST_ID, "SPGist", spg_redo, spg_desc, spg_xlog_startup, spg_xlog_cleanup, NULL)
diff --git a/src/include/access/xlog_internal.h b/src/include/access/xlog_internal.h
index ce9957e..34c6593 100644
--- a/src/include/access/xlog_internal.h
+++ b/src/include/access/xlog_internal.h
@@ -233,7 +233,10 @@ struct XLogRecord;
/*
* Method table for resource managers.
*
- * RmgrTable[] is indexed by RmgrId values (see rmgr.h).
+ * This struct must be kept in sync with the PG_RMGR definition in
+ * rmgr.c.
+ *
+ * RmgrTable[] is indexed by RmgrId values (see rmgrlist.h).
*/
typedef struct RmgrData
{
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..bbad5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +107,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d8678e5..e5377b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2521,6 +2521,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..0b591ce 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..4053b53
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE reind_con_tab CONCURRENTLY; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Hi Michael,
On 2013-02-07 16:45:57 +0900, Michael Paquier wrote:
Please find attached a patch fixing 3 of the 4 problems reported before
(the patch does not contain docs).
Cool!
1) Removal of the quadratic dependency with list_append_unique_oid
2) Minimization of the wait phase for parent relations, this is done in a
single transaction before phase 2
3) Authorization of the drop for invalid system indexes
I think there's also the issue of some minor changes required to make
exclusion constraints work.
The problem remaining is related to toast indexes. In current master code,
tuptoastter.c assumes that the index attached to the toast relation is
unique
This creates a problem when running concurrent reindex on toast indexes,
because after phase 2, there is this problem:
pg_toast_index valid && ready
pg_toast_index_cct valid && !ready
The concurrent toast index went though index_build is set as valid. So at
this instant, the index can be used when inserting new entries.
Um, isn't pg_toast_index_cct !valid && ready?
However, when inserting a new entry in the toast index, only the index
registered in reltoastidxid is used for insertion in
tuptoaster.c:toast_save_datum.
toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
This cannot work when there are concurrent toast indexes as in this case
the toast index is thought as unique.In order to fix that, it is necessary to extend toast_save_datum to insert
index data to the other concurrent indexes as well, and I am currently
thinking about two possible approaches:
1) Change reltoastidxid from oid type to oidvector to be able to manage
multiple toast index inserts. The concurrent indexes would be added in this
vector once built and all the indexes in this vector would be used by
tuptoaster.c:toast_save_datum. Not backward compatible but does it matter
for toast relations?
I don't see a problem breaking backward compat in that area.
2) Add new oidvector column in pg_class containing a vector of concurrent
toast index Oids built but not validated. toast_save_datum would scan this
vector and insert entries in index if there are any present in vector.
What about
3) Use reltoastidxid if != InvalidOid and manually build the list (using
RelationGetIndexList) otherwise? That should keep the additional
overhead minimal and should be relatively straightforward to implement?
I think your patch accidentially squashed in some other changes (like
5a1cd89f8f), care to repost without?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
What about
3) Use reltoastidxid if != InvalidOid and manually build the list (using
RelationGetIndexList) otherwise?
Do we actually need reltoastidxid at all? I always thought having that
field was a case of premature optimization. There might be some case
for keeping it to avoid breaking any client-side code that might be
looking at it ... but if you're proposing changing the field contents
anyway, that argument goes right out the window.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 7, 2013 at 4:55 PM, Andres Freund <andres@2ndquadrant.com>wrote:
1) Removal of the quadratic dependency with list_append_unique_oid
2) Minimization of the wait phase for parent relations, this is done in a
single transaction before phase 2
3) Authorization of the drop for invalid system indexesI think there's also the issue of some minor changes required to make
exclusion constraints work.
Thanks for reminding, I completely forgot this issue. I added a check with
a comment in execUtils.c:check_exclusion_constraint for that.
The problem remaining is related to toast indexes. In current master
code,
tuptoastter.c assumes that the index attached to the toast relation is
unique
This creates a problem when running concurrent reindex on toast indexes,
because after phase 2, there is this problem:
pg_toast_index valid && ready
pg_toast_index_cct valid && !ready
The concurrent toast index went though index_build is set as valid. So at
this instant, the index can be used when inserting new entries.Um, isn't pg_toast_index_cct !valid && ready?
You are right ;)
However, when inserting a new entry in the toast index, only the index
registered in reltoastidxid is used for insertion in
tuptoaster.c:toast_save_datum.
toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
This cannot work when there are concurrent toast indexes as in this case
the toast index is thought as unique.In order to fix that, it is necessary to extend toast_save_datum to
insert
index data to the other concurrent indexes as well, and I am currently
thinking about two possible approaches:
1) Change reltoastidxid from oid type to oidvector to be able to manage
multiple toast index inserts. The concurrent indexes would be added inthis
vector once built and all the indexes in this vector would be used by
tuptoaster.c:toast_save_datum. Not backward compatible but does it matter
for toast relations?I don't see a problem breaking backward compat in that area.
Agreed. I though so.
2) Add new oidvector column in pg_class containing a vector of concurrent
toast index Oids built but not validated. toast_save_datum would scanthis
vector and insert entries in index if there are any present in vector.
What about
3) Use reltoastidxid if != InvalidOid and manually build the list (using
RelationGetIndexList) otherwise? That should keep the additional
overhead minimal and should be relatively straightforward to implement?
OK. Here is a new idea.
I think your patch accidentially squashed in some other changes (like
5a1cd89f8f), care to repost without?
That's... well... unfortunate... Updated version attached.
--
Michael
Attachments:
20130107_reindex_concurrently_v9b.patchapplication/octet-stream; name=20130107_reindex_concurrently_v9b.patchDownload
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..6c7179d 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2654,7 +2654,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..448d2ba 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1084,7 +1094,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1096,6 +1106,395 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel, parentRel;
+
+ /*
+ * If the index swapped is a toast index, take a row exclusive lock on its
+ * parent toast relation before the involved indexes, it is necessary to
+ * take a lock before the indexes on the toast table as in this case
+ * the reltoastidxid is updated to the new index Oid.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ parentRel = heap_open(parentOid, RowExclusiveLock);
+ }
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(parentRel, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(parentRel, RowExclusiveLock);
+ }
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1326,7 +1725,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1408,17 +1806,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1446,63 +1835,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1515,13 +1849,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1943,6 +2271,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1956,7 +2286,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2071,7 +2402,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3189,7 +3521,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3385a1..493a085 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,526 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on each session lock separate transaction */
+ StartTransactionCommand();
+ foreach(lc, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc);
+ Assert(localTag && localTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*localTag, ShareLock);
+ }
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1956,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1983,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2102,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1747,18 +2181,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2223,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2238,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2329,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eeddd9a..36bd576 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -891,6 +891,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..1890766 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2da08d1..b9cd66b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9e313c8..c7a5345 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1841,6 +1841,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 342b796..60a6c96 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6672,29 +6672,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..b5d8cc0 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,117 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transaction hold the relation related to lock this lock.
+ * To do this, inquire which xacts currently would conflict with this lock on
+ * the table referred by the LOCKTAG -- ie, which ones have a lock that permits
+ * writing the relation. Then wait for each of these xacts to commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..bbad5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +107,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d8678e5..e5377b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2521,6 +2521,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..0b591ce 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-02-07 03:01:36 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
What about
3) Use reltoastidxid if != InvalidOid and manually build the list (using
RelationGetIndexList) otherwise?Do we actually need reltoastidxid at all? I always thought having that
field was a case of premature optimization.
I am a bit doubtful its really measurable as well. Really supporting a
dynamic number of indexes might be noticeable because we would need to
allocate memory et al for each toasted Datum, but only supporting one or
two seems easy enough.
The only advantage besides the dubious performance advantage of my
proposed solution is that less code needs to change as only
toast_save_datum() would need to change.
There might be some case
for keeping it to avoid breaking any client-side code that might be
looking at it ... but if you're proposing changing the field contents
anyway, that argument goes right out the window.
Well, it would only be 0/InvalidOid while being reindexed concurrently,
but yea.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 7, 2013 at 5:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@2ndquadrant.com> writes:
What about
3) Use reltoastidxid if != InvalidOid and manually build the list (using
RelationGetIndexList) otherwise?Do we actually need reltoastidxid at all? I always thought having that
field was a case of premature optimization. There might be some case
for keeping it to avoid breaking any client-side code that might be
looking at it ... but if you're proposing changing the field contents
anyway, that argument goes right out the window.
Here is an interesting idea. Could there be some performance impact if we
remove this field and replace it by RelationGetIndexList to fetch the list
of indexes that need to be inserted?
--
Michael
On Thu, Feb 7, 2013 at 5:15 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-02-07 03:01:36 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
What about
3) Use reltoastidxid if != InvalidOid and manually build the list
(using
RelationGetIndexList) otherwise?
Do we actually need reltoastidxid at all? I always thought having that
field was a case of premature optimization.I am a bit doubtful its really measurable as well. Really supporting a
dynamic number of indexes might be noticeable because we would need to
allocate memory et al for each toasted Datum, but only supporting one or
two seems easy enough.The only advantage besides the dubious performance advantage of my
proposed solution is that less code needs to change as only
toast_save_datum() would need to change.There might be some case
for keeping it to avoid breaking any client-side code that might be
looking at it ... but if you're proposing changing the field contents
anyway, that argument goes right out the window.Well, it would only be 0/InvalidOid while being reindexed concurrently,
but yea.
Removing reltoastindxid is more appealing for at least 2 reasons regarding
current implementation of REINDEX CONCURRENTLY:
1) if reltoastidxid is set to InvalidOid during a concurrent reindex and
reindex fails, how would it be possible to set it back to the correct
value? This would need more special code, which could become a maintenance
burden for sure.
2) There is already some special code in my patch to update reltoastidxid
to the new Oid value when swapping indexes. Removing that would honestly
make the index swapping cleaner.
Btw, I think that if this optimization for toast relations is done, it
should be a separate patch. Also, as I am not a specialist in toast
indexes, any opinion about potential performance impact (if any) is welcome
if we remove reltoastidxid and use RelationGetIndexList instead.
--
Michael
On 2013-02-07 17:28:53 +0900, Michael Paquier wrote:
On Thu, Feb 7, 2013 at 5:15 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-02-07 03:01:36 -0500, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
What about
3) Use reltoastidxid if != InvalidOid and manually build the list
(using
RelationGetIndexList) otherwise?
Do we actually need reltoastidxid at all? I always thought having that
field was a case of premature optimization.I am a bit doubtful its really measurable as well. Really supporting a
dynamic number of indexes might be noticeable because we would need to
allocate memory et al for each toasted Datum, but only supporting one or
two seems easy enough.The only advantage besides the dubious performance advantage of my
proposed solution is that less code needs to change as only
toast_save_datum() would need to change.There might be some case
for keeping it to avoid breaking any client-side code that might be
looking at it ... but if you're proposing changing the field contents
anyway, that argument goes right out the window.Well, it would only be 0/InvalidOid while being reindexed concurrently,
but yea.Removing reltoastindxid is more appealing for at least 2 reasons regarding
current implementation of REINDEX CONCURRENTLY:
1) if reltoastidxid is set to InvalidOid during a concurrent reindex and
reindex fails, how would it be possible to set it back to the correct
value? This would need more special code, which could become a maintenance
burden for sure.
I would just let it stay slightly less efficient till the index is
dropped/reindexed.
Btw, I think that if this optimization for toast relations is done, it
should be a separate patch.
What do you mean by a separate patch? Commit it before committing
REINDEX CONCURRENTLY? If so, yes, sure. If you mean it can be fixed
later, I don't really see how, since this is an unresolved problem...
Also, as I am not a specialist in toast
indexes, any opinion about potential performance impact (if any) is welcome
if we remove reltoastidxid and use RelationGetIndexList instead.
Tom doubted it will be really measurable, so did I... If anything I
think it will be measurable during querying toast tables. So possibly we
would have to retain reltoastidxid for querying...
The minimal (not so nice) patch to make this correct probably is fairly
easy.
Changing only toast_save_datum:
Relation toastidx[2];
...
if (toastrel->rd_indexvalid == 0)
RelationGetIndexList(toastrel);
num_indexes = list_length(toastrel->rd_indexlist);
if (num_indexes == 1)
toastidx[0] = index_open(toastrel->rd_rel->reltoastidxid);
else if (num_indexes == 2)
{
int off = 0;
ListCell *l;
foreach(l, RelationGetIndexList(toastrel))
toastidx[off] = index_open(lfirst_oid(l));
}
else
elog(ERROR, "toast indexes with unsupported number of indexes");
...
for (cur_index = 0; cur_index < num_indexes; cur_index++)
index_insert(toastidx[cur_index], t_values, t_isnull,
&(toasttup->t_self),
toastrel,
toastidx[cur_index]->rd_index->indisunique ?
UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
...
for (cur_index = 0; cur_index < num_indexes; cur_index++)
index_close(toastidx[cur_index], RowExclusiveLock);
(that indisunique check seems like a copy&paste remnant btw).
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2013-02-07 16:45:57 +0900, Michael Paquier wrote:
Please find attached a patch fixing 3 of the 4 problems reported before
(the patch does not contain docs).
1) Removal of the quadratic dependency with list_append_unique_oid
Afaics you now simply lock objects multiple times, is that right?
2) Minimization of the wait phase for parent relations, this is done in a
single transaction before phase 2
Unfortunately I don't think this did the trick. You currently have the
following:
+ /* Perform a wait on each session lock separate transaction */
+ StartTransactionCommand();
+ foreach(lc, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(lc);
+ Assert(localTag && localTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*localTag, ShareLock);
+ }
+ CommitTransactionCommand();
and
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&heaplocktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
To get rid of the issue you need to batch all the GetLockConflicts calls
together before doing any of the VirtualXactLocks. Otherwise other
backends will produce new conflicts on relation n+1 while you wait for
relation n.
So it would need to be something like:
void
WaitForVirtualLocksList(List heaplocktags, LOCKMODE lockmode)
{
VirtualTransactionId **old_lockholders;
ListCell *lc;
int off = 0;
int i;
old_lockholders = palloc(sizeof(VirtualTransactionId *) *
list_length(heaplocktags));
/* collect transactions we need to wait on for all transactions */
foreach(lc, heaplocktags)
{
LOCKTAG *tag = lfirst(lc);
old_lockholders[off++] = GetLockConflicts(tag, lockmode);
}
/* wait on all transactions */
for (i = 0; i < off; i++)
{
VirtualTransactionId *lockholders = old_lockholders[i];
while (VirtualTransactionIdIsValid(lockholders[i]))
{
VirtualXactLock(lockholders[i], true);
lockholders++;
}
}
}
Makes sense?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 12, 2013 at 8:47 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-02-07 17:28:53 +0900, Michael Paquier wrote:
On Thu, Feb 7, 2013 at 5:15 PM, Andres Freund <andres@2ndquadrant.com>wrote:
Btw, I think that if this optimization for toast relations is done, it
should be a separate patch.
What do you mean by a separate patch? Commit it before committing
REINDEX CONCURRENTLY? If so, yes, sure. If you mean it can be fixed
later, I don't really see how, since this is an unresolved problem...
Of course I meant that it would be necessary to validate the toast patch
first, it is a prerequisite for REINDEX CONCURRENTLY. Sorry for not being
that clear.
Also, as I am not a specialist in toast
indexes, any opinion about potential performance impact (if any) iswelcome
if we remove reltoastidxid and use RelationGetIndexList instead.
Tom doubted it will be really measurable, so did I... If anything I
think it will be measurable during querying toast tables. So possibly we
would have to retain reltoastidxid for querying...The minimal (not so nice) patch to make this correct probably is fairly
easy.Changing only toast_save_datum:
[... code ...]
Yes, I have spent a little bit of time looking at the code related to
retoastindxid and thought about this possibility. It would make the changes
far easier with the existing patch, it will also be necessary to update the
catalog pg_statio_all_tables to make the case where OID is InvalidOid
correct with this catalog. However, I do not think it is as clean as simply
removing retoastindxid and have all the toast APIs running consistent
operations, aka using only RelationGetIndexList.
--
Michael
On 2013-02-12 21:54:52 +0900, Michael Paquier wrote:
Changing only toast_save_datum:
[... code ...]
Yes, I have spent a little bit of time looking at the code related to
retoastindxid and thought about this possibility. It would make the changes
far easier with the existing patch, it will also be necessary to update the
catalog pg_statio_all_tables to make the case where OID is InvalidOid
correct with this catalog.
What I proposed above wouldn't need the case where toastrelidx =
InvalidOid, so no need to worry about that.
However, I do not think it is as clean as simply
removing retoastindxid and have all the toast APIs running consistent
operations, aka using only RelationGetIndexList.
Sure. This just seems easier as it really only requires changes inside
toast_save_datum() and which mostly avoids any overhead (not even
additional palloc()s) if there is only one index.
That would lower the burden of proof that no performance regressions
exist (which I guess would be during querying) and the amount of
possibly external breakage due to removing the field...
Not sure whats the best way to do this when committing. But I think you
could incorporate something like the proposed to continue working on the
patch. It really should only take some minutes to incorporate it.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Feb 12, 2013 at 10:04 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-02-12 21:54:52 +0900, Michael Paquier wrote:
Changing only toast_save_datum:
[... code ...]
Yes, I have spent a little bit of time looking at the code related to
retoastindxid and thought about this possibility. It would make thechanges
far easier with the existing patch, it will also be necessary to update
the
catalog pg_statio_all_tables to make the case where OID is InvalidOid
correct with this catalog.What I proposed above wouldn't need the case where toastrelidx =
InvalidOid, so no need to worry about that.
[re-reading code...] Oh ok. I missed the point in your previous email. Yeah
indeed you are right.
However, I do not think it is as clean as simply
removing retoastindxid and have all the toast APIs running consistent
operations, aka using only RelationGetIndexList.Sure. This just seems easier as it really only requires changes inside
toast_save_datum() and which mostly avoids any overhead (not even
additional palloc()s) if there is only one index.
That would lower the burden of proof that no performance regressions
exist (which I guess would be during querying) and the amount of
possibly external breakage due to removing the field...Not sure whats the best way to do this when committing. But I think you
could incorporate something like the proposed to continue working on the
patch. It really should only take some minutes to incorporate it.
OK I'll add the changes you are proposing. I still want to have a look at
the approach for the removal of reltoastidxid btw.
--
Michael
Hi,
Please find attached a new version of the patch incorporating the 2 fixes
requested:
- Fix for to insert new data to multiple toast indexes in toast_save_datum
if necessary
- Fix the lock wait phase with new function WaitForMultipleVirtualLocks
allowing to perform a wait on multiple locktags at the same time.
WaitForVirtualLocks uses also WaitForMultipleVirtualLocks but on a single
locktag.
I am still looking at the approach removing reltoastidxid, approach more
complicated but cleaner than what is currently done in the patch.
Regards,
On Tue, Feb 12, 2013 at 10:04 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-02-12 21:54:52 +0900, Michael Paquier wrote:
Changing only toast_save_datum:
[... code ...]
Yes, I have spent a little bit of time looking at the code related to
retoastindxid and thought about this possibility. It would make thechanges
far easier with the existing patch, it will also be necessary to update
the
catalog pg_statio_all_tables to make the case where OID is InvalidOid
correct with this catalog.What I proposed above wouldn't need the case where toastrelidx =
InvalidOid, so no need to worry about that.However, I do not think it is as clean as simply
removing retoastindxid and have all the toast APIs running consistent
operations, aka using only RelationGetIndexList.Sure. This just seems easier as it really only requires changes inside
toast_save_datum() and which mostly avoids any overhead (not even
additional palloc()s) if there is only one index.
That would lower the burden of proof that no performance regressions
exist (which I guess would be during querying) and the amount of
possibly external breakage due to removing the field...Not sure whats the best way to do this when committing. But I think you
could incorporate something like the proposed to continue working on the
patch. It really should only take some minutes to incorporate it.Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Michael
Attachments:
20130213_reindex_concurrently_v10.patchapplication/octet-stream; name=20130213_reindex_concurrently_v10.patchDownload
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 49f1553..f6a9f14 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1236,7 +1236,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidx;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1255,6 +1255,9 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int count = 0;
+ int num_indexes;
+ ListCell *lc;
/*
* Open the toast relation and its index. We can use the index to check
@@ -1263,7 +1266,38 @@ toast_save_datum(Relation rel, Datum value,
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch the list of indexes for toast relation if necessary */
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Allocate enough space for all the relations */
+ toastidx = (Relation *)
+ palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * Now open an relation for each possible index. A toast relation can have
+ * multiple concurrent indexes created by REINDEX CONCURRENTLY that might
+ * have been created in parallel.
+ */
+ if (num_indexes == 1)
+ {
+ /*
+ * This is the case of a single index present, so use the one referenced
+ * directly in toast relation.
+ */
+ toastidx[0] = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ }
+ else
+ {
+ /*
+ * There are concurrent indexes existing with the toast index, so open
+ * relations on all of them to insert new toast value everywhere.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidx[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+ }
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1328,7 +1362,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidx[0]),
(AttrNumber) 1);
}
else
@@ -1376,13 +1410,14 @@ toast_save_datum(Relation rel, Datum value,
{
/*
* new value; must choose an OID that doesn't conflict in either
- * old or new toast table
+ * old or new toast table. An Oid value based on the first toast
+ * index in list is used.
*/
do
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidx[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1425,12 +1460,14 @@ toast_save_datum(Relation rel, Datum value,
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
+ * Insertion is done on all the indexes of the toast relation.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidx[count], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidx[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,8 +1484,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidx[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidx);
/*
* Create the TOAST pointer value that we'll return
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..6c7179d 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2654,7 +2654,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..448d2ba 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1084,7 +1094,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1096,6 +1106,395 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /*
+ * Now build the index, in the case of a parent relation being a toast
+ * relation, its reltoastidxid is updated when calling index_concurrent_swap.
+ */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * For toast indexes, it is also necessary to modify reltoastidxid of the parent
+ * relation so we need also to take RowExclusiveLock in this case until the
+ * end of the transaction block for this relation.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel, parentRel;
+
+ /*
+ * If the index swapped is a toast index, take a row exclusive lock on its
+ * parent toast relation before the involved indexes, it is necessary to
+ * take a lock before the indexes on the toast table as in this case
+ * the reltoastidxid is updated to the new index Oid.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Open pg_class and fetch a writable copy of the relation tuple */
+ parentRel = heap_open(parentOid, RowExclusiveLock);
+ }
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * If the index swapped is a toast index, take an exclusive lock on its
+ * parent toast relation and then update reltoastidxid to the new index Oid
+ * value.
+ */
+ if (get_rel_relkind(parentOid) == RELKIND_TOASTVALUE)
+ {
+ /* Update the statistics of this pg_class entry with new toast index Oid */
+ index_update_stats(parentRel, false, false, newIndexOid, -1.0);
+
+ /* Close parent relation */
+ heap_close(parentRel, RowExclusiveLock);
+ }
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1326,7 +1725,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1408,17 +1806,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1446,63 +1835,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1515,13 +1849,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1943,6 +2271,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1956,7 +2286,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -2071,7 +2402,8 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
+ (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) &&
+ istoastupdate ?
RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
@@ -3189,7 +3521,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3385a1..0f0d873 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,521 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1951,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1978,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2097,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1747,18 +2176,30 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent && !ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2218,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2233,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2324,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eeddd9a..36bd576 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -891,6 +891,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..1890766 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2da08d1..b9cd66b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9e313c8..c7a5345 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1841,6 +1841,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index fee0531..e087e91 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6672,29 +6672,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..bbad5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +107,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d8678e5..e5377b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2521,6 +2521,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Hi all,
Please find attached a new set of 3 patches for REINDEX CONCURRENTLY (v11).
- 20130214_1_remove_reltoastidxid.patch
- 20130214_2_reindex_concurrently_v11.patch
- 20130214_3_reindex_concurrently_docs_v11.patch
Patch 1 needs to be applied before patches 2 and 3.
20130214_1_remove_reltoastidxid.patch is the patch removing reltoastidxid
(approach mentioned by Tom) to allow server to manipulate multiple indexes
of toast relations. Catalog views, system functions and pg_upgrade have
been updated in consequence by replacing reltoastidxid use by a join on
pg_index/pg_class. All the functions of tuptoaster.c now use
RelationGetIndexList to fetch the list of indexes on which depend a given
toast relation. There are no warnings, regressions are passing (here only
an update of rules.out and oidjoins has been necessary).
20130214_2_reindex_concurrently_v11.patch depends on patch 1. It includes
the feature with all the fixes requested by Andres in his previous reviews.
Regressions are passing and I haven't seen any warnings. in this patch
concurrent rebuild of toast indexes is fully supported thanks to patch 1.
The kludge used in previous version to change reltoastidxid when swapping
indexes is not needed anymore, making swap code far cleaner.
20130214_3_reindex_concurrently_docs_v11.patch includes the documentation
of REINDEX CONCURRENTLY. This might need some reshuffling with what is
written for CREATE INDEX CONCURRENTLY.
I am now pretty happy with the way implementation is done, so I think that
the basic implementation architecture does not need to be changed.
Andres, I think that only a single round of review would be necessary now
before setting this patch as ready for committer. Thoughts?
Comments, as well as reviews are welcome.
--
Michael
Attachments:
20130214_1_remove_reltoastidxid.patchapplication/octet-stream; name=20130214_1_remove_reltoastidxid.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index 1905c43..f74b36b 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9144eec..e7ad6b1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 49f1553..1ba34c3 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1236,7 +1236,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1255,15 +1255,26 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int count = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1325,10 +1336,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1382,7 +1396,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1421,16 +1435,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,8 +1463,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1473,10 +1491,13 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1485,10 +1506,20 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1503,7 +1534,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1517,8 +1548,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1535,6 +1568,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1544,9 +1581,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1590,7 +1628,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1605,6 +1643,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1620,11 +1661,18 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1643,7 +1691,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1732,8 +1780,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1749,7 +1799,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1772,6 +1822,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int count = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1814,11 +1867,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1859,7 +1919,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1956,8 +2016,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..ba0437a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c479c23..2154907 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -459,16 +459,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index c0cb2f6..9fb12e4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1151,8 +1151,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1361,18 +1359,53 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if all the indexes
+ * have valid Oids.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);
+
+ /* Obtain index list if necessary */
+ if (toastRel1->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel1);
+ if (toastRel2->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) &&
+ !list_member_oid(toastRel2->rd_indexlist, InvalidOid) &&
+ list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ heap_close(toastRel1, RowExclusiveLock);
+ heap_close(toastRel2, RowExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1496,12 +1529,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1510,11 +1545,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries are thought as being concurrent indexes.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eeddd9a..eefadb2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8645,7 +8645,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8653,6 +8652,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8696,7 +8697,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8775,8 +8777,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 89ad386..0e11ba9 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -331,7 +331,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -339,8 +339,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -350,12 +350,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ if (toastRel->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based on the first index available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 43d571c..3480e16 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2503,10 +2503,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2515,7 +2514,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2540,11 +2538,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 2b8df5e..df68afe 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201302131
+#define CATALOG_VERSION_NO 20130215
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 820552f..363c0b6 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 869ca8c..470698a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1840,15 +1840,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130214_2_reindex_concurrently_v11.patchapplication/octet-stream; name=20130214_2_reindex_concurrently_v11.patchDownload
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ba0437a..baca453 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2653,7 +2653,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..a5d405b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,363 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel, parentRel;
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1691,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1772,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1801,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1815,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1931,6 +2227,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1944,7 +2242,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -3174,7 +3473,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3385a1..dd0b74d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,521 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1951,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1978,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2097,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1747,18 +2176,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2221,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2236,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2327,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eefadb2..d9d44e0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -891,6 +891,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..1890766 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2da08d1..b9cd66b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9e313c8..c7a5345 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1841,6 +1841,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index fee0531..e087e91 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6672,29 +6672,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..bbad5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +107,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d8678e5..e5377b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2521,6 +2521,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
20130214_3_reindex_concurrently_docs_v11.patchapplication/octet-stream; name=20130214_3_reindex_concurrently_docs_v11.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..6d2cc53 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,111 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +385,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
On Thu, Feb 14, 2013 at 4:08 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Hi all,
Please find attached a new set of 3 patches for REINDEX CONCURRENTLY (v11).
- 20130214_1_remove_reltoastidxid.patch
- 20130214_2_reindex_concurrently_v11.patch
- 20130214_3_reindex_concurrently_docs_v11.patch
Patch 1 needs to be applied before patches 2 and 3.20130214_1_remove_reltoastidxid.patch is the patch removing reltoastidxid
(approach mentioned by Tom) to allow server to manipulate multiple indexes
of toast relations. Catalog views, system functions and pg_upgrade have been
updated in consequence by replacing reltoastidxid use by a join on
pg_index/pg_class. All the functions of tuptoaster.c now use
RelationGetIndexList to fetch the list of indexes on which depend a given
toast relation. There are no warnings, regressions are passing (here only an
update of rules.out and oidjoins has been necessary).
20130214_2_reindex_concurrently_v11.patch depends on patch 1. It includes
the feature with all the fixes requested by Andres in his previous reviews.
Regressions are passing and I haven't seen any warnings. in this patch
concurrent rebuild of toast indexes is fully supported thanks to patch 1.
The kludge used in previous version to change reltoastidxid when swapping
indexes is not needed anymore, making swap code far cleaner.
20130214_3_reindex_concurrently_docs_v11.patch includes the documentation of
REINDEX CONCURRENTLY. This might need some reshuffling with what is written
for CREATE INDEX CONCURRENTLY.I am now pretty happy with the way implementation is done, so I think that
the basic implementation architecture does not need to be changed.
Andres, I think that only a single round of review would be necessary now
before setting this patch as ready for committer. Thoughts?Comments, as well as reviews are welcome.
When I compiled the HEAD with the patches, I got the following warnings.
index.c:1273: warning: unused variable 'parentRel'
execUtils.c:1199: warning: 'return' with no value, in function
returning non-void
When I ran REINDEX CONCURRENTLY for the same index from two different
sessions, I got the deadlock. The error log is:
ERROR: deadlock detected
DETAIL: Process 37121 waits for ShareLock on virtual transaction
2/196; blocked by process 36413.
Process 36413 waits for ShareUpdateExclusiveLock on relation 16457 of
database 12293; blocked by process 37121.
Process 37121: REINDEX TABLE CONCURRENTLY pgbench_accounts;
Process 36413: REINDEX TABLE CONCURRENTLY pgbench_accounts;
HINT: See server log for query details.
STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;
And, after the REINDEX CONCURRENTLY that survived the deadlock finished,
I found that new index with another name was created. It was NOT marked as
INVALID. Are these behaviors intentional?
=# \di pgbench_accounts*
List of relations
Schema | Name | Type | Owner | Table
--------+---------------------------+-------+----------+------------------
public | pgbench_accounts_pkey | index | postgres | pgbench_accounts
public | pgbench_accounts_pkey_cct | index | postgres | pgbench_accounts
(2 rows)
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for your review!
On Wed, Feb 20, 2013 at 12:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
When I compiled the HEAD with the patches, I got the following warnings.
index.c:1273: warning: unused variable 'parentRel'
execUtils.c:1199: warning: 'return' with no value, in function
returning non-void
Oops, corrected.
When I ran REINDEX CONCURRENTLY for the same index from two different
sessions, I got the deadlock. The error log is:ERROR: deadlock detected
DETAIL: Process 37121 waits for ShareLock on virtual transaction
2/196; blocked by process 36413.
Process 36413 waits for ShareUpdateExclusiveLock on relation 16457
of
database 12293; blocked by process 37121.
Process 37121: REINDEX TABLE CONCURRENTLY pgbench_accounts;
Process 36413: REINDEX TABLE CONCURRENTLY pgbench_accounts;
HINT: See server log for query details.
STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;And, after the REINDEX CONCURRENTLY that survived the deadlock finished,
I found that new index with another name was created. It was NOT marked as
INVALID. Are these behaviors intentional?
This happens because of the following scenario:
- session 1: REINDEX CONCURRENTLY, that has not yet reached phase 3 where
indexes are validated. necessary ShareUpdateExclusiveLock locks are taken
on relations rebuilt.
- session 2: REINDEX CONCURRENTLY, waits for a ShareUpdateExclusiveLock
lock to be obtained, its transaction begins before session 1 reaches phase 3
- session 1: enters phase 3, and fails at WaitForOldSnapshots as session 2
has an older snapshot and is currently waiting for lock on session 1
- session 2: succeeds, but concurrent index created by session 1 still
exists
A ShareUpdateExclusiveLock is taken on index or table that is going to be
rebuilt just before calling ReindexRelationConcurrently. So the solution I
have here is to make REINDEX CONCURRENTLY fail for session 2. REINDEX
CONCURRENTLY is made to allow a table to run DML in parallel to the
operation so it doesn't look strange to me to make session 2 fail if
REINDEX CONCURRENTLY is done in parallel on the same relation.
This fixes the problem of the concurrent index *_cct appearing after
session 1 failed due to the deadlock in Masao's report.
The patch correcting this problem is attached.
Error message could be improved, here is what it is now when session 2
fails:
postgres=# reindex table concurrently aa;
ERROR: could not obtain lock on relation "aa"
Comments?
--
Michael
Attachments:
20130221_1_remove_reltoastidxid.patchapplication/octet-stream; name=20130221_1_remove_reltoastidxid.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index 1905c43..f74b36b 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9144eec..e7ad6b1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 49f1553..1ba34c3 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1236,7 +1236,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1255,15 +1255,26 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int count = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1325,10 +1336,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1382,7 +1396,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1421,16 +1435,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,8 +1463,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1473,10 +1491,13 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1485,10 +1506,20 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1503,7 +1534,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1517,8 +1548,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1535,6 +1568,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1544,9 +1581,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1590,7 +1628,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1605,6 +1643,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1620,11 +1661,18 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1643,7 +1691,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1732,8 +1780,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1749,7 +1799,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1772,6 +1822,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int count = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1814,11 +1867,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1859,7 +1919,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1956,8 +2016,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..ba0437a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c479c23..2154907 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -459,16 +459,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index c0cb2f6..9fb12e4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1151,8 +1151,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1361,18 +1359,53 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if all the indexes
+ * have valid Oids.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);
+
+ /* Obtain index list if necessary */
+ if (toastRel1->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel1);
+ if (toastRel2->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) &&
+ !list_member_oid(toastRel2->rd_indexlist, InvalidOid) &&
+ list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ heap_close(toastRel1, RowExclusiveLock);
+ heap_close(toastRel2, RowExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1496,12 +1529,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1510,11 +1545,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries are thought as being concurrent indexes.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eeddd9a..eefadb2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8645,7 +8645,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8653,6 +8652,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8696,7 +8697,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8775,8 +8777,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 89ad386..0e11ba9 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -331,7 +331,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -339,8 +339,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -350,12 +350,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ if (toastRel->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based on the first index available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 43d571c..3480e16 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2503,10 +2503,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2515,7 +2514,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2540,11 +2538,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ab91ab0..7d137b4 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201302181
+#define CATALOG_VERSION_NO 20130219
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 820552f..363c0b6 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 869ca8c..470698a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1840,15 +1840,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130221_2_reindex_concurrently_v12.patchapplication/octet-stream; name=20130221_2_reindex_concurrently_v12.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..6d2cc53 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,111 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +385,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ba0437a..baca453 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2653,7 +2653,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..9abf0e9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,363 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel;
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1691,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1772,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1801,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1815,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1931,6 +2227,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1944,7 +2242,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -3174,7 +3473,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3385a1..4351ce5 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,521 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1951,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1978,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2097,27 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
-
- reindex_index(indOid, false);
+ /*
+ * Two REINDEX CONCURRENTLY cannot be done in parallel on the same relation
+ * so obtention of indOid fails if ShareUpdateExclusiveLock is already
+ * taken on it.
+ */
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, concurrent ? true : false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
+
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1747,18 +2181,38 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
- /* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ /*
+ * The lock level used here should match reindex_relation().
+ * Two REINDEX CONCURRENTLY cannot be done in parallel on the same relation
+ * so obtention of heapOid fails if ShareUpdateExclusiveLock is already
+ * taken on it.
+ */
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, concurrent ? true : false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2231,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2246,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2337,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eefadb2..d9d44e0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -891,6 +891,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2da08d1..b9cd66b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9e313c8..c7a5345 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1841,6 +1841,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b998431..a4c2d6e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6680,29 +6680,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..bbad5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +107,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d8678e5..e5377b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2521,6 +2521,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/lock.h b/src/include/storage/lock.h
index 9642a19..d8e8f80 100644
--- a/src/include/storage/lock.h
+++ b/src/include/storage/lock.h
@@ -145,7 +145,8 @@ typedef uint16 LOCKMETHODID;
#define RowShareLock 2 /* SELECT FOR UPDATE/FOR SHARE */
#define RowExclusiveLock 3 /* INSERT, UPDATE, DELETE */
#define ShareUpdateExclusiveLock 4 /* VACUUM (non-FULL),ANALYZE, CREATE
- * INDEX CONCURRENTLY */
+ * INDEX CONCURRENTLY,
+ * REINDEX CONCURRENTLY */
#define ShareLock 5 /* CREATE INDEX (WITHOUT CONCURRENTLY) */
#define ShareRowExclusiveLock 6 /* like EXCLUSIVE MODE, but allows ROW
* SHARE */
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Thu, Feb 21, 2013 at 11:55 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
A ShareUpdateExclusiveLock is taken on index or table that is going to be
rebuilt just before calling ReindexRelationConcurrently. So the solution I
have here is to make REINDEX CONCURRENTLY fail for session 2. REINDEX
CONCURRENTLY is made to allow a table to run DML in parallel to the
operation so it doesn't look strange to me to make session 2 fail if REINDEX
CONCURRENTLY is done in parallel on the same relation.
Thanks for updating the patch!
With updated patch, REINDEX CONCURRENTLY seems to fail even when
SharedUpdateExclusiveLock is taken by the command other than REINDEX
CONCURRENTLY, for example, VACUUM. Is this intentional? This behavior
should be avoided. Otherwise, users might need to disable autovacuum
whenever they run REINDEX CONCURRENTLY.
With updated patch, unfortunately, I got the similar deadlock error when I
ran REINDEX CONCURRENTLY in session1 and ANALYZE in session2.
ERROR: deadlock detected
DETAIL: Process 70551 waits for ShareLock on virtual transaction
3/745; blocked by process 70652.
Process 70652 waits for ShareUpdateExclusiveLock on relation 17460 of
database 12293; blocked by process 70551.
Process 70551: REINDEX TABLE CONCURRENTLY pgbench_accounts;
Process 70652: ANALYZE pgbench_accounts;
HINT: See server log for query details.
STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;
Like original problem that I reported, temporary index created by REINDEX
CONCURRENTLY was NOT marked as INVALID.
=# \di pgbench_accounts*
List of relations
Schema | Name | Type | Owner | Table
--------+---------------------------+-------+----------+------------------
public | pgbench_accounts_pkey | index | postgres | pgbench_accounts
public | pgbench_accounts_pkey_cct | index | postgres | pgbench_accounts
(2 rows)
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Feb 23, 2013 at 2:14 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Feb 21, 2013 at 11:55 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:A ShareUpdateExclusiveLock is taken on index or table that is going to be
rebuilt just before calling ReindexRelationConcurrently. So the solutionI
have here is to make REINDEX CONCURRENTLY fail for session 2. REINDEX
CONCURRENTLY is made to allow a table to run DML in parallel to the
operation so it doesn't look strange to me to make session 2 fail ifREINDEX
CONCURRENTLY is done in parallel on the same relation.
Thanks for updating the patch!
With updated patch, REINDEX CONCURRENTLY seems to fail even when
SharedUpdateExclusiveLock is taken by the command other than REINDEX
CONCURRENTLY, for example, VACUUM. Is this intentional? This behavior
should be avoided. Otherwise, users might need to disable autovacuum
whenever they run REINDEX CONCURRENTLY.With updated patch, unfortunately, I got the similar deadlock error when I
ran REINDEX CONCURRENTLY in session1 and ANALYZE in session2.
Such deadlocks are also possible when running manual VACUUM with CREATE
INDEX CONCURRENTLY. This is because ANALYZE can be included in a
transaction that might do arbitrary operations on the parent table (see
comments in indexcmds.c) between the index build and validation. So the
only problem I see here is that the concurrent index is marked as VALID in
the transaction when a deadlock occurs and REINDEX CONCURRENTLY fails,
right?
ERROR: deadlock detected
DETAIL: Process 70551 waits for ShareLock on virtual transaction
3/745; blocked by process 70652.
Process 70652 waits for ShareUpdateExclusiveLock on relation 17460
of
database 12293; blocked by process 70551.
Process 70551: REINDEX TABLE CONCURRENTLY pgbench_accounts;
Process 70652: ANALYZE pgbench_accounts;
HINT: See server log for query details.
STATEMENT: REINDEX TABLE CONCURRENTLY pgbench_accounts;Like original problem that I reported, temporary index created by REINDEX
CONCURRENTLY was NOT marked as INVALID.=# \di pgbench_accounts*
List of relations
Schema | Name | Type | Owner | Table
--------+---------------------------+-------+----------+------------------
public | pgbench_accounts_pkey | index | postgres | pgbench_accounts
public | pgbench_accounts_pkey_cct | index | postgres | pgbench_accounts
(2 rows)
Btw, ¥di also prints invalid indexes...
OK, so what you want to see is the index being marked as not valid when a
deadlock occurs with REINDEX CONCURRENTLY when an ANALYZE kicks in (btw,
deadlocks are also possible with CREATE INDEX CONCURRENTLY when ANALYZE is
done on a table, in this case the index is marked as not valid). So indeed
there was a bug in my code for v12 and prior as if a deadlock occurred the
concurrent index was marked as valid.
I have been able to fix that with updated patch attached, which removed the
change done in v12 and checks for deadlock at phase 3 before actually
marking the index as valid (opposite operation was done in v11 and below
making the indexes being seen as valid when the deadlock appeared).
So now here is what heppens with a deadlock:
ioltas=# create table aa (a int);
CREATE TABLE
ioltas=# create index aap on aa (a);
CREATE INDEX
ioltas=# reindex index concurrently aap;
ERROR: deadlock detected
DETAIL: Process 32174 waits for ShareLock on virtual transaction 3/2;
blocked by process 32190.
Process 32190 waits for ShareUpdateExclusiveLock on relation 16385 of
database 16384; blocked by process 32174.
HINT: See server log for query details.
And how the relation remains after the deadlock:
ioltas=# \d aa
Table "public.aa"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
Indexes:
"aap" btree (a)
"aap_cct" btree (a) INVALID
ioltas=# \di aa*
List of relations
Schema | Name | Type | Owner | Table
--------+---------+-------+--------+-------
public | aap | index | ioltas | aa
public | aap_cct | index | ioltas | aa
(2 rows)
The potential *problem* (actually that looks more to be a non-problem) is
the case of REINDEX CONCURRENTLY run on a table with multiple indexes.
For example, let's take the case of a table with 2 indexes.
1) Session 1: Run REINDEX CONCURRENTLY on this table.
2) Session 2: Run ANALYZE on this table after 1st index has been validated
but before the 2nd index is validated
3) Session 1: fails due to a deadlock, the table containing 3 valid
indexes, the former 2 indexes and the 1st concurrent one that has been
validated. The 2nd concurrent index is marked as not valid.
This can happen when REINDEX CONCURRENTLY conflicts with the following
commands: CREATE INDEX CONCURRENTLY, another REINDEX CONCURRENTLY and
ANALYZE. Note that the 1st concurrent index is perfectly valid, so user can
still drop the 1st old index after the deadlock.
So, in the case of a single index being rebuilt with REINDEX CONCURRENTLY
there are no problems, but there is a risk of multiplying the number of
indexes on a table when it is used to rebuild multiple indexes at the same
time with REINDEX TABLE CONCURRENTLY, or even REINDEX DATABASE
CONCURRENTLY. I think that this feature can live with that as long as the
user is aware of the risks when doing a REINDEX CONCURRENTLY that rebuilds
more than 1 index at the same time. Comments?
--
Michael
Attachments:
20130226_1_remove_reltoastidxid.patchapplication/octet-stream; name=20130226_1_remove_reltoastidxid.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index 1905c43..f74b36b 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9144eec..e7ad6b1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 49f1553..1ba34c3 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1236,7 +1236,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1255,15 +1255,26 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int count = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1325,10 +1336,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1382,7 +1396,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1421,16 +1435,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,8 +1463,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1473,10 +1491,13 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1485,10 +1506,20 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1503,7 +1534,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1517,8 +1548,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1535,6 +1568,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1544,9 +1581,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1590,7 +1628,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1605,6 +1643,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1620,11 +1661,18 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1643,7 +1691,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1732,8 +1780,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1749,7 +1799,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1772,6 +1822,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int count = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1814,11 +1867,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1859,7 +1919,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1956,8 +2016,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..ba0437a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c479c23..2154907 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -459,16 +459,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index c0cb2f6..9fb12e4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1151,8 +1151,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1361,18 +1359,53 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if all the indexes
+ * have valid Oids.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);
+
+ /* Obtain index list if necessary */
+ if (toastRel1->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel1);
+ if (toastRel2->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) &&
+ !list_member_oid(toastRel2->rd_indexlist, InvalidOid) &&
+ list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ heap_close(toastRel1, RowExclusiveLock);
+ heap_close(toastRel2, RowExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1496,12 +1529,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1510,11 +1545,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries are thought as being concurrent indexes.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eeddd9a..eefadb2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8645,7 +8645,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8653,6 +8652,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8696,7 +8697,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8775,8 +8777,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 11b0040..89a1445 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ if (toastRel->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based on the first index available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 43d571c..3480e16 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2503,10 +2503,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2515,7 +2514,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2540,11 +2538,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ab91ab0..7d137b4 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201302181
+#define CATALOG_VERSION_NO 20130219
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 820552f..363c0b6 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 869ca8c..470698a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1840,15 +1840,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130226_2_reindex_concurrently_v13.patchapplication/octet-stream; name=20130226_2_reindex_concurrently_v13.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..6d2cc53 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,111 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +385,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ba0437a..baca453 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2653,7 +2653,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..9abf0e9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,363 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+
+ /* Pick up column name from the relation */
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel;
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1691,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1772,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1801,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1815,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1931,6 +2227,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1944,7 +2242,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -3174,7 +3473,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3385a1..ce4e994 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,521 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1951,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1978,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2097,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1747,18 +2176,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2221,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2236,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2327,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eefadb2..d9d44e0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -891,6 +891,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 2da08d1..b9cd66b 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3602,6 +3602,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 9e313c8..c7a5345 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1841,6 +1841,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index b998431..a4c2d6e 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6680,29 +6680,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..bbad5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +107,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d8678e5..e5377b4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2521,6 +2521,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Andres, Masao, do you need an extra round or review or do you think this is
ready to be marked as committer?
On my side I have nothing more to add to the existing patches.
Thanks,
--
Michael
Hi,
Michael Paquier <michael.paquier@gmail.com> schrieb:
Andres, Masao, do you need an extra round or review or do you think
this is
ready to be marked as committer?
On my side I have nothing more to add to the existing patches.
I think they do need review before that - I won't be able to do another review before the weekend though.
Andres
---
Please excuse brevity and formatting - I am writing this on my mobile phone.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 28, 2013 at 4:56 PM, anarazel@anarazel.de <andres@anarazel.de>wrote:
Hi,
Michael Paquier <michael.paquier@gmail.com> schrieb:
Andres, Masao, do you need an extra round or review or do you think
this is
ready to be marked as committer?
On my side I have nothing more to add to the existing patches.I think they do need review before that - I won't be able to do another
review before the weekend though.
Sure. Thanks.
--
Michael
On Thu, Feb 28, 2013 at 3:21 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Andres, Masao, do you need an extra round or review or do you think this is
ready to be marked as committer?
On my side I have nothing more to add to the existing patches.
Sorry for the late reply.
I found one problem in the latest patch. I got the segmentation fault
when I executed the following SQLs.
CREATE TABLE hoge (i int);
CREATE INDEX hogeidx ON hoge(abs(i));
INSERT INTO hoge VALUES (generate_series(1,10));
REINDEX TABLE CONCURRENTLY hoge;
The error messages are:
LOG: server process (PID 33641) was terminated by signal 11: Segmentation fault
DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Feb 28, 2013 at 11:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
I found one problem in the latest patch. I got the segmentation fault
when I executed the following SQLs.CREATE TABLE hoge (i int);
CREATE INDEX hogeidx ON hoge(abs(i));
INSERT INTO hoge VALUES (generate_series(1,10));
REINDEX TABLE CONCURRENTLY hoge;The error messages are:
LOG: server process (PID 33641) was terminated by signal 11: Segmentation
fault
DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;
Oops. Index expressions were not correctly extracted when building
columnNames for index_create in index_concurrent_create.
Fixed in this new patch. Thanks for catching that.
--
Michael
Attachments:
20130301_1_remove_reltoastidxid.patchapplication/octet-stream; name=20130301_1_remove_reltoastidxid.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index 1905c43..f74b36b 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 9144eec..e7ad6b1 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 49f1553..1ba34c3 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1236,7 +1236,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1255,15 +1255,26 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int count = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1325,10 +1336,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1382,7 +1396,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1421,16 +1435,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,8 +1463,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1473,10 +1491,13 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1485,10 +1506,20 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1503,7 +1534,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1517,8 +1548,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1535,6 +1568,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1544,9 +1581,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1590,7 +1628,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1605,6 +1643,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1620,11 +1661,18 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1643,7 +1691,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1732,8 +1780,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1749,7 +1799,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1772,6 +1822,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int count = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1814,11 +1867,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1859,7 +1919,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1956,8 +2016,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index db51e0b..ba0437a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index c479c23..2154907 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -459,16 +459,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index c0cb2f6..9fb12e4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1151,8 +1151,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1361,18 +1359,53 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if all the indexes
+ * have valid Oids.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);
+
+ /* Obtain index list if necessary */
+ if (toastRel1->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel1);
+ if (toastRel2->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) &&
+ !list_member_oid(toastRel2->rd_indexlist, InvalidOid) &&
+ list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ heap_close(toastRel1, RowExclusiveLock);
+ heap_close(toastRel2, RowExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1496,12 +1529,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1510,11 +1545,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries are thought as being concurrent indexes.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eeddd9a..eefadb2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8645,7 +8645,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8653,6 +8652,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8696,7 +8697,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8775,8 +8777,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 11b0040..89a1445 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ if (toastRel->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based on the first index available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 7903b79..c62ce3b 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2512,10 +2512,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2524,7 +2523,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2549,11 +2547,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ab91ab0..7d137b4 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201302181
+#define CATALOG_VERSION_NO 20130219
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 820552f..363c0b6 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 869ca8c..470698a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1840,15 +1840,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130301_2_reindex_concurrently_v14.patchapplication/octet-stream; name=20130301_2_reindex_concurrently_v14.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..6d2cc53 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,111 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactiions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index that will replace the one to
+ be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the old index is marked as invalid
+ in a third transaction. Finally two additional transactions are used to mark
+ the old index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</> once again.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +385,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 82ef726..fe25410 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -1145,7 +1145,7 @@ build_indices(void)
heap = heap_open(ILHead->il_heap, NoLock);
ind = index_open(ILHead->il_ind, NoLock);
- index_build(heap, ind, ILHead->il_info, false, false);
+ index_build(heap, ind, ILHead->il_info, false, false, true);
index_close(ind, NoLock);
heap_close(heap, NoLock);
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index ba0437a..baca453 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2653,7 +2653,7 @@ RelationTruncateIndexes(Relation heapRelation)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, currentIndex, indexInfo, false, true);
+ index_build(heapRelation, currentIndex, indexInfo, false, true, true);
/* We're done with this index */
index_close(currentIndex, NoLock);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..1fe82c8 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1083,7 +1093,7 @@ index_create(Relation heapRelation,
}
else
{
- index_build(heapRelation, indexRelation, indexInfo, isprimary, false);
+ index_build(heapRelation, indexRelation, indexInfo, isprimary, false, true);
}
/*
@@ -1095,6 +1105,393 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple;
+ Datum indclassDatum, indoptionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+ else
+ {
+ Node *indnode;
+ char *nodeName;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ nodeName = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ columnNames = lappend(columnNames, nodeName);
+ }
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ indoptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(indoptionDatum);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ (Datum) indexRelation->rd_options,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation names of the indexes. If extra
+ * operations are necessary during a concurrent swap, processing should be
+ * added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ char *nameNew, *nameOld, *nameTemp;
+ Oid parentOid = IndexGetRelation(oldIndexOid, false);
+ Relation oldIndexRel, newIndexRel;
+
+ /*
+ * Take a lock on the old and new index before switching their names. This
+ * avoids having index swapping relying on relation renaming mechanism to
+ * get a lock on the relations involved.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Allocate all the names used for this operation */
+ nameNew = get_rel_name(newIndexOid);
+ nameOld = get_rel_name(oldIndexOid);
+ /* Build a unique temporary name */
+ nameTemp = ChooseRelationName((const char *) get_rel_name(oldIndexOid),
+ NULL,
+ "tmp",
+ get_rel_namespace(oldIndexOid));
+
+ /* Change the name of old index to something temporary */
+ RenameRelationInternal(oldIndexOid, nameTemp);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Change the name of the new index with the old one */
+ RenameRelationInternal(newIndexOid, nameOld);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* Finally change the name of old index with name of the new one */
+ RenameRelationInternal(oldIndexOid, nameNew);
+
+ /* Make the catalog update visible */
+ CommandCounterIncrement();
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+
+ /*
+ * Scan for potential foreign keys on the index being swapped and change its
+ * dependencies to the new index created concurrently.
+ */
+ switchIndexConstraintOnForeignKey(parentOid, oldIndexOid, newIndexOid);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1721,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1802,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1831,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1845,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1931,6 +2257,8 @@ index_update_stats(Relation rel,
*
* isprimary tells whether to mark the index as a primary-key index.
* isreindex indicates we are recreating a previously-existing index.
+ * istoastupdate tells whether it is necessary to update the toast index Oid
+ * for parent relation.
*
* Note: when reindexing an existing index, isprimary can be false even if
* the index is a PK; it's already properly marked and need not be re-marked.
@@ -1944,7 +2272,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
{
RegProcedure procedure;
IndexBuildResult *stats;
@@ -3174,7 +3503,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
/* Initialize the index and rebuild */
/* Note: we do not need to re-establish pkey setting */
- index_build(heapRelation, iRel, indexInfo, false, true);
+ index_build(heapRelation, iRel, indexInfo, false, true, true);
}
PG_CATCH();
{
diff --git a/src/backend/catalog/pg_constraint.c b/src/backend/catalog/pg_constraint.c
index 7179fa9..63fa201 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -973,3 +973,79 @@ check_functional_grouping(Oid relid,
return result;
}
+
+/*
+ * switchIndexConstraintOnForeignKey
+ *
+ * Switch foreign keys references for a given index to a new index created
+ * concurrently. This process is used when swapping indexes for a concurrent
+ * process. All the constraints that are not referenced externally like primary
+ * keys or unique indexes should be switched using the structure of index.c for
+ * concurrent index creation and drop.
+ * This function takes care of also switching the dependencies of the foreign
+ * key from the old index to the new index in pg_depend.
+ *
+ * In order to complete this process, the following process is done:
+ * 1) Scan pg_constraint and extract the list of foreign keys that refer to the
+ * parent relation of the index being swapped as conrelid.
+ * 2) Check in this list the foreign keys that use the old index as reference
+ * here with conindid
+ * 3) Update field conindid to the new index Oid on all the foreign keys
+ * 4) Switch dependencies of the foreign key to the new index
+ */
+void
+switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid)
+{
+ ScanKeyData skey[1];
+ SysScanDesc conscan;
+ Relation conRel;
+ HeapTuple htup;
+
+ /*
+ * Search pg_constraint for the foreign key constraints associated
+ * with the index by scanning using conrelid.
+ */
+ ScanKeyInit(&skey[0],
+ Anum_pg_constraint_confrelid,
+ BTEqualStrategyNumber, F_OIDEQ,
+ ObjectIdGetDatum(parentOid));
+
+ conRel = heap_open(ConstraintRelationId, AccessShareLock);
+ conscan = systable_beginscan(conRel, ConstraintForeignRelidIndexId,
+ true, SnapshotNow, 1, skey);
+
+ while (HeapTupleIsValid(htup = systable_getnext(conscan)))
+ {
+ Form_pg_constraint contuple = (Form_pg_constraint) GETSTRUCT(htup);
+
+ /* Check if a foreign constraint uses the index being swapped */
+ if (contuple->contype == CONSTRAINT_FOREIGN &&
+ contuple->confrelid == parentOid &&
+ contuple->conindid == oldIndexOid)
+ {
+ /*
+ * An index has been found, so first switch all the dependencies
+ * of this foreign key from the old index to the new index.
+ */
+ changeDependencyFor(ConstraintRelationId,
+ HeapTupleGetOid(htup),
+ RelationRelationId,
+ oldIndexOid,
+ newIndexOid);
+
+ /* Then update its pg_constraint entry */
+ htup = heap_copytuple(htup);
+ contuple = (Form_pg_constraint) GETSTRUCT(htup);
+ contuple->conindid = newIndexOid;
+ simple_heap_update(conRel, &htup->t_self, htup);
+
+ /* Update the system catalog indexes */
+ CatalogUpdateIndexes(conRel, htup);
+ }
+ }
+
+ systable_endscan(conscan);
+ heap_close(conRel, AccessShareLock);
+}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 7c4ccbd..e8608c4 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -280,7 +280,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3385a1..ce4e994 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -452,7 +448,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -599,7 +596,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -662,18 +659,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -693,27 +680,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -737,13 +710,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -772,74 +739,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -852,7 +754,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -872,6 +774,521 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap and mark all the indexes involved in the relation */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, indOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The old indexes need to be marked as not ready. We need also to wait for
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Mark the old indexes as not ready */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the old indexes. This needs to be done through performDeletion
+ * or related dependencies will not be dropped for the old indexes. The
+ * internal mechanism of DROP INDEX CONCURRENTLY is not used as here the
+ * indexes are already considered as dead and invalid, so they will not
+ * be used by other backends.
+ */
+ foreach(lc, indexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(indexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1534,7 +1951,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1560,6 +1978,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1672,18 +2097,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1747,18 +2176,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1777,7 +2221,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1789,6 +2236,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1871,15 +2327,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index eefadb2..d9d44e0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -891,6 +891,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 23ec88d..eac9407 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3603,6 +3603,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 99c034a..4f5c1ec 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1842,6 +1842,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index d3009b6..5f70638 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6703,29 +6703,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 8904c6f..7360dda 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1282,15 +1282,19 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1302,8 +1306,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..bbad5fe 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -88,7 +107,8 @@ extern void index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex);
+ bool isreindex,
+ bool istoastupdate);
extern double IndexBuildHeapScan(Relation heapRelation,
Relation indexRelation,
diff --git a/src/include/catalog/indexing.h b/src/include/catalog/indexing.h
index 6251fb8..3555b14 100644
--- a/src/include/catalog/indexing.h
+++ b/src/include/catalog/indexing.h
@@ -123,6 +123,9 @@ DECLARE_INDEX(pg_constraint_contypid_index, 2666, on pg_constraint using btree(c
#define ConstraintTypidIndexId 2666
DECLARE_UNIQUE_INDEX(pg_constraint_oid_index, 2667, on pg_constraint using btree(oid oid_ops));
#define ConstraintOidIndexId 2667
+/* This following index is not used for a cache and is not unique */
+DECLARE_INDEX(pg_constraint_confrelid_index, 3086, on pg_constraint using btree(confrelid oid_ops));
+#define ConstraintForeignRelidIndexId 3086
DECLARE_UNIQUE_INDEX(pg_conversion_default_index, 2668, on pg_conversion using btree(connamespace oid_ops, conforencoding int4_ops, contoencoding int4_ops, oid oid_ops));
#define ConversionDefaultIndexId 2668
diff --git a/src/include/catalog/pg_constraint.h b/src/include/catalog/pg_constraint.h
index 29f71f1..a37d39a 100644
--- a/src/include/catalog/pg_constraint.h
+++ b/src/include/catalog/pg_constraint.h
@@ -254,4 +254,8 @@ extern bool check_functional_grouping(Oid relid,
List *grouping_columns,
List **constraintDeps);
+extern void switchIndexConstraintOnForeignKey(Oid parentOid,
+ Oid oldIndexOid,
+ Oid newIndexOid);
+
#endif /* PG_CONSTRAINT_H */
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d54990d..71cf97c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2522,6 +2522,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Fri, Mar 1, 2013 at 12:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Thu, Feb 28, 2013 at 11:26 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
I found one problem in the latest patch. I got the segmentation fault
when I executed the following SQLs.CREATE TABLE hoge (i int);
CREATE INDEX hogeidx ON hoge(abs(i));
INSERT INTO hoge VALUES (generate_series(1,10));
REINDEX TABLE CONCURRENTLY hoge;The error messages are:
LOG: server process (PID 33641) was terminated by signal 11: Segmentation
fault
DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;Oops. Index expressions were not correctly extracted when building
columnNames for index_create in index_concurrent_create.
Fixed in this new patch. Thanks for catching that.
I found another problem in the latest patch. When I issued the following SQLs,
I got the assertion failure.
CREATE EXTENSION pg_trgm;
CREATE TABLE hoge (col1 text);
CREATE INDEX hogeidx ON hoge USING gin (col1 gin_trgm_ops) WITH
(fastupdate = off);
INSERT INTO hoge SELECT random()::text FROM generate_series(1,100);
REINDEX TABLE CONCURRENTLY hoge;
The error message that I got is:
TRAP: FailedAssertion("!(((array)->elemtype) == 25)", File:
"reloptions.c", Line: 874)
LOG: server process (PID 45353) was terminated by signal 6: Abort trap
DETAIL: Failed process was running: REINDEX TABLE CONCURRENTLY hoge;
ISTM that the patch doesn't handle the gin option "fastupdate = off" correctly.
Anyway, I think you should test whether REINDEX CONCURRENTLY goes well
with every type of indexes, before posting the next patch. Otherwise,
I might find
another problem ;P
@@ -1944,7 +2272,8 @@ index_build(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo,
bool isprimary,
- bool isreindex)
+ bool isreindex,
+ bool istoastupdate)
istoastupdate seems to be unused.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 2, 2013 at 2:43 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Fixed in this new patch. Thanks for catching that.
After make installcheck finished, I connected to the "regression" database
and issued "REINDEX DATABASE CONCURRENTLY regression", then
I got the error:
ERROR: constraints cannot have index expressions
STATEMENT: REINDEX DATABASE CONCURRENTLY regression;
OTOH "REINDEX DATABASE regression" did not generate an error.
Is this a bug?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
REINDEX CONCURRENTLY resets the statistics in pg_stat_user_indexes,
whereas plain REINDEX does not. I think they should be preserved in
either case.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-03-01 16:32:19 -0500, Peter Eisentraut wrote:
REINDEX CONCURRENTLY resets the statistics in pg_stat_user_indexes,
whereas plain REINDEX does not. I think they should be preserved in
either case.
Yes. Imo this further suggests that it would be better to switch the
relfilenodes (+indisclustered) of the two indexes instead of switching
the names. That would allow to get rid of the code for moving over
dependencies as well.
Given we use an exclusive lock for the switchover phase anyway, there's
not much point in going for the name-based switch. Especially as some
eventual mvcc-correct system access would be fine with the relfilenode
method.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Please find attached an updated patch fixing the following issues:
- gin and gist indexes are now rebuilt correctly. Some option values were
not passed to the concurrent indexes (reported by Masao)
- swap is done with relfilenode and not names. In consequence
pg_stat_user_indexes is not reset (reported by Peter).
I am looking at the issue reported previously with make installcheck.
Regards,
On Sun, Mar 3, 2013 at 9:54 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-01 16:32:19 -0500, Peter Eisentraut wrote:
REINDEX CONCURRENTLY resets the statistics in pg_stat_user_indexes,
whereas plain REINDEX does not. I think they should be preserved in
either case.Yes. Imo this further suggests that it would be better to switch the
relfilenodes (+indisclustered) of the two indexes instead of switching
the names. That would allow to get rid of the code for moving over
dependencies as well.
Given we use an exclusive lock for the switchover phase anyway, there's
not much point in going for the name-based switch. Especially as some
eventual mvcc-correct system access would be fine with the relfilenode
method.Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Michael
Attachments:
20130304_1_remove_reltoastidxid.patchapplication/octet-stream; name=20130304_1_remove_reltoastidxid.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 81c1be3..e1475e6 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..be27211 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,26 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int count = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1338,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1398,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1437,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1465,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1475,10 +1493,13 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,20 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1536,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1550,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1570,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1583,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1630,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1645,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1663,18 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1693,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1782,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1801,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1824,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int count = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1869,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1921,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2018,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0ecfc78..043b279 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..c5f6a0a 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,53 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if all the indexes
+ * have valid Oids.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);
+
+ /* Obtain index list if necessary */
+ if (toastRel1->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel1);
+ if (toastRel2->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) &&
+ !list_member_oid(toastRel2->rd_indexlist, InvalidOid) &&
+ list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ heap_close(toastRel1, RowExclusiveLock);
+ heap_close(toastRel2, RowExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1547,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1563,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries are thought as being concurrent indexes.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..0d6f5c0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8678,7 +8678,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8686,6 +8685,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8729,7 +8730,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8808,8 +8810,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 8963266..3dd2fda 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -577,8 +577,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -590,7 +590,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..11921ac 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ if (toastRel->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based on the first index available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e6c85ac..f15e6a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ab91ab0..7d137b4 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201302181
+#define CATALOG_VERSION_NO 20130219
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130304_2_reindex_concurrently_v15.patchapplication/octet-stream; name=20130304_2_reindex_concurrently_v15.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..051ebd7 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,112 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the index used during process is
+ marked as invalid in a third transaction. Finally two additional
+ transactions are used to mark the concurrent index as not ready and then
+ drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +386,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..3db10c3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,6 +43,7 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
@@ -672,6 +673,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +700,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +744,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1095,6 +1105,395 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ columnNames = lappend(columnNames, pstrdup(NameStr(attform->attname)));
+ }
+ else
+ {
+ Node *indnode;
+ char *nodeName;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ nodeName = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ columnNames = lappend(columnNames, nodeName);
+ }
+ }
+
+ /*
+ * Index is considered as a constraint if it is UNIQUE, PRIMARY KEY or
+ * EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisunique ||
+ indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1723,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1804,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1833,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1847,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..a12dcb9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,27 +681,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,74 +740,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,521 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(concurrentIndexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1952,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1979,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2098,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2177,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2222,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2237,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2329,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0d6f5c0..0bd67a2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..db2a531 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..d03a1f6 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,46 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind2" btree (c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..91ee74e 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,33 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Hi all,
Please find attached a patch fixing the last issue that Masao found with
make installcheck. Now REINDEX DATABASE CONCURRENTLY on the regression
database passes. There were 2 problems:
- Concurrent indexes for unique indexes using expressions were not
correctly created
- Concurrent indexes for indexes with duplicate column names were not
correctly created.
So, this solves the last issue currently on stack. I added some new tests
in regressions to cover those problems.
Regards,
--
Michael
Attachments:
20130304_1_remove_reltoastidxid.patchapplication/octet-stream; name=20130304_1_remove_reltoastidxid.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 81c1be3..e1475e6 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..be27211 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,26 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int count = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1338,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1398,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1437,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1465,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1475,10 +1493,13 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,20 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1536,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1550,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1570,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1583,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1630,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1645,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int count = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1663,18 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1693,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1782,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1801,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1824,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int count = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1869,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[count++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1921,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2018,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (count = 0; count < num_indexes; count++)
+ index_close(toastidxs[count], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0ecfc78..043b279 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..c5f6a0a 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,53 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if all the indexes
+ * have valid Oids.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);
+
+ /* Obtain index list if necessary */
+ if (toastRel1->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel1);
+ if (toastRel2->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) &&
+ !list_member_oid(toastRel2->rd_indexlist, InvalidOid) &&
+ list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ heap_close(toastRel1, RowExclusiveLock);
+ heap_close(toastRel2, RowExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1547,14 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1563,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries are thought as being concurrent indexes.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..0d6f5c0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8678,7 +8678,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8686,6 +8685,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8729,7 +8730,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8808,8 +8810,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 8963266..3dd2fda 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -577,8 +577,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -590,7 +590,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..11921ac 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ if (toastRel->rd_indexvalid == 0)
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based on the first index available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e6c85ac..f15e6a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ab91ab0..7d137b4 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201302181
+#define CATALOG_VERSION_NO 20130219
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130304_2_reindex_concurrently_v16.patchapplication/octet-stream; name=20130304_2_reindex_concurrently_v16.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..051ebd7 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,112 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the index used during process is
+ marked as invalid in a third transaction. Finally two additional
+ transactions are used to mark the concurrent index as not ready and then
+ drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +386,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..f72efbb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,23 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
* This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * in the grammar anyway, so it can't happen. This might be called during a
+ * conccurrent reindex operation, in this case sufficient locks are already
+ * taken on the related relations.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1095,6 +1106,427 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constTuple;
+ Form_pg_constraint constraint;
+
+ constTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraint = (Form_pg_constraint) GETSTRUCT(constTuple);
+ initdeferred = constraint->condeferred;
+
+ ReleaseSysCache(constTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ int i;
+ char buf[NAMEDATALEN];
+
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (i = 1;; i++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", i);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /*
+ * Index is considered as a constraint if it is PRIMARY KEY or EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process Deletion is done through performDeletion or dependencies of the
+ * index are not dropped. At this point all the indexes are already considered
+ * as invalid and dead so they can be dropped without using any concurrent
+ * options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /* Leave if index is still alive */
+ if (indislive)
+ return;
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1756,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1837,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1866,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1880,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..a12dcb9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,27 +681,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,74 +740,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,521 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /* Get the first element of concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ foreach(lc, indexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Get the first element is concurrent index list */
+ lc2 = list_head(concurrentIndexIds);
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ foreach(lc, indexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Move to next concurrent item */
+ lc2 = lnext(lc2);
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.s
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(concurrentIndexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1952,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1979,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2098,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2177,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2222,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2237,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2329,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0d6f5c0..0bd67a2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,36 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..db2a531 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..88ec81a 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,54 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a0b2ae2 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,39 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Hi,
Have you benchmarked the toastrelidx removal stuff in any way? If not,
thats fine, but if yes I'd be interested.
On 2013-03-04 22:33:53 +0900, Michael Paquier wrote:
--- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value, struct varlena * oldexternal, int options) { Relation toastrel; - Relation toastidx; + Relation *toastidxs; HeapTuple toasttup; TupleDesc toasttupDesc; Datum t_values[3]; @@ -1257,15 +1257,26 @@ toast_save_datum(Relation rel, Datum value, char *data_p; int32 data_todo; Pointer dval = DatumGetPointer(value); + ListCell *lc; + int count = 0;
I find count a confusing name for a loop iteration variable... i of orr,
idxno, or ...
+ int num_indexes;
/* * Open the toast relation and its index. We can use the index to check * uniqueness of the OID we assign to the toasted item, even though it has - * additional columns besides OID. + * additional columns besides OID. A toast table can have multiple identical + * indexes associated to it. */ toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock); toasttupDesc = toastrel->rd_att; - toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock); + if (toastrel->rd_indexvalid == 0) + RelationGetIndexList(toastrel);
Hm, I think we should move this into a macro, this is cropping up at
more and more places.
- index_insert(toastidx, t_values, t_isnull, - &(toasttup->t_self), - toastrel, - toastidx->rd_index->indisunique ? - UNIQUE_CHECK_YES : UNIQUE_CHECK_NO); + for (count = 0; count < num_indexes; count++) + index_insert(toastidxs[count], t_values, t_isnull, + &(toasttup->t_self), + toastrel, + toastidxs[count]->rd_index->indisunique ? + UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
The indisunique check looks like a copy & pasto to me, albeit not
yours...
/* * Create the TOAST pointer value that we'll return @@ -1475,10 +1493,13 @@ toast_delete_datum(Relation rel, Datum value) struct varlena *attr = (struct varlena *) DatumGetPointer(value); struct varatt_external toast_pointer; + /* + * We actually use only the first index but taking a lock on all is + * necessary. + */
Hm, is it guaranteed that the first index is valid?
+ foreach(lc, toastrel->rd_indexlist) + toastidxs[count++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/* - * If we're swapping two toast tables by content, do the same for their - * indexes. + * If we're swapping two toast tables by content, do the same for all of + * their indexes. The swap can actually be safely done only if all the indexes + * have valid Oids. */
What's an index without a valid oid?
if (swap_toast_by_content && - relform1->reltoastidxid && relform2->reltoastidxid) - swap_relation_files(relform1->reltoastidxid, - relform2->reltoastidxid, - target_is_pg_class, - swap_toast_by_content, - InvalidTransactionId, - InvalidMultiXactId, - mapped_tables); + relform1->reltoastrelid && + relform2->reltoastrelid) + { + Relation toastRel1, toastRel2; + + /* Open relations */ + toastRel1 = heap_open(relform1->reltoastrelid, RowExclusiveLock); + toastRel2 = heap_open(relform2->reltoastrelid, RowExclusiveLock);
Shouldn't those be Access Exlusive Locks?
+ /* Obtain index list if necessary */ + if (toastRel1->rd_indexvalid == 0) + RelationGetIndexList(toastRel1); + if (toastRel2->rd_indexvalid == 0) + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toast indexes */
So there's no error being thrown if this turns out not to be possible?
+ if (!list_member_oid(toastRel1->rd_indexlist, InvalidOid) && + !list_member_oid(toastRel2->rd_indexlist, InvalidOid) && + list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist)) + { + ListCell *lc1, *lc2; + + /* Now swap each couple */ + lc2 = list_head(toastRel2->rd_indexlist); + foreach(lc1, toastRel1->rd_indexlist) + { + Oid indexOid1 = lfirst_oid(lc1); + Oid indexOid2 = lfirst_oid(lc2); + swap_relation_files(indexOid1, + indexOid2, + target_is_pg_class, + swap_toast_by_content, + InvalidTransactionId, + InvalidMultiXactId, + mapped_tables); + lc2 = lnext(lc2); + } + } + + heap_close(toastRel1, RowExclusiveLock); + heap_close(toastRel2, RowExclusiveLock); + }
/* rename the toast table ... */
@@ -1528,11 +1563,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);- /* ... and its index too */ - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", - OIDOldHeap); - RenameRelationInternal(toastidx, - NewToastName); + /* ... and its indexes too */ + foreach(lc, toastrel->rd_indexlist) + { + /* + * The first index keeps the former toast name and the + * following entries are thought as being concurrent indexes. + */ + if (count == 0) + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", + OIDOldHeap); + else + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d", + OIDOldHeap, count); + RenameRelationInternal(lfirst_oid(lc), + NewToastName); + count++; + }
Hm. It seems wrong that this layer needs to know about _cct.
/* - * Calculate total on-disk size of a TOAST relation, including its index. + * Calculate total on-disk size of a TOAST relation, including its indexes. * Must not be applied to non-TOAST relations. */ static int64 @@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid) { ... + /* Size is evaluated based on the first index available */
Uh. Why? Imo all indexes should be counted.
+ foreach(lc, toastRel->rd_indexlist) + { + Relation toastIdxRel; + toastIdxRel = relation_open(lfirst_oid(lc), + AccessShareLock); + for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++) + size += calculate_relation_size(&(toastIdxRel->rd_node), + toastIdxRel->rd_backend, forkNum); + + relation_close(toastIdxRel, AccessShareLock); + }
-#define CATALOG_VERSION_NO 201302181 +#define CATALOG_VERSION_NO 20130219
Think you forgot a digit here ;)
/* * This case is currently not supported, but there's no way to ask for it - * in the grammar anyway, so it can't happen. + * in the grammar anyway, so it can't happen. This might be called during a + * conccurrent reindex operation, in this case sufficient locks are already + * taken on the related relations. */
I'd rather change that to something like
/*
* This case is currently only supported during a concurrent index
* rebuild, but there is no way to ask for it in the grammar otherwise
* anyway.
*/
Or similar.
+ +/* + * index_concurrent_create + * + * Create an index based on the given one that will be used for concurrent + * operations. The index is inserted into catalogs and needs to be built later + * on. This is called during concurrent index processing. The heap relation + * on which is based the index needs to be closed by the caller. + */ +Oid +index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName) +{ ... + /* + * Determine if index is initdeferred, this depends on its dependent + * constraint. + */ + if (OidIsValid(constraintOid)) + { + /* Look for the correct value */ + HeapTuple constTuple; + Form_pg_constraint constraint; + + constTuple = SearchSysCache1(CONSTROID, + ObjectIdGetDatum(constraintOid)); + if (!HeapTupleIsValid(constTuple)) + elog(ERROR, "cache lookup failed for constraint %u", + constraintOid); + constraint = (Form_pg_constraint) GETSTRUCT(constTuple); + initdeferred = constraint->condeferred; + + ReleaseSysCache(constTuple); + }
Very, very nitpicky, but I find "constTuple" to be confusing, I thought
at first it meant that the tuple shouldn't be modified or something.
+ /* + * Index is considered as a constraint if it is PRIMARY KEY or EXCLUSION. + */ + isconstraint = indexRelation->rd_index->indisprimary || + indexRelation->rd_index->indisexclusion;
unique constraints aren't mattering here?
+/* + * index_concurrent_swap + * + * Replace old index by old index in a concurrent context. For the time being + * what is done here is switching the relation relfilenode of the indexes. If + * extra operations are necessary during a concurrent swap, processing should + * be added here. AccessExclusiveLock is taken on the index relations that are + * swapped until the end of the transaction where this function is called. + */ +void +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid) +{ + Relation oldIndexRel, newIndexRel, pg_class; + HeapTuple oldIndexTuple, newIndexTuple; + Form_pg_class oldIndexForm, newIndexForm; + Oid tmpnode; + + /* + * Take an exclusive lock on the old and new index before swapping them. + */ + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock); + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock); + + /* Now swap relfilenode of those indexes */
Any chance to reuse swap_relation_files here? Not sure whether it would
be beneficial given that it is more generic and normally works on a
relation level...
We probably should remove the fsm of the index altogether after this?
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock); + + oldIndexTuple = SearchSysCacheCopy1(RELOID, + ObjectIdGetDatum(oldIndexOid)); + if (!HeapTupleIsValid(oldIndexTuple)) + elog(ERROR, "could not find tuple for relation %u", oldIndexOid); + newIndexTuple = SearchSysCacheCopy1(RELOID, + ObjectIdGetDatum(newIndexOid)); + if (!HeapTupleIsValid(newIndexTuple)) + elog(ERROR, "could not find tuple for relation %u", newIndexOid); + oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple); + newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple); + + /* Here is where the actual swapping happens */ + tmpnode = oldIndexForm->relfilenode; + oldIndexForm->relfilenode = newIndexForm->relfilenode; + newIndexForm->relfilenode = tmpnode; + + /* Then update the tuples for each relation */ + simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple); + simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple); + CatalogUpdateIndexes(pg_class, oldIndexTuple); + CatalogUpdateIndexes(pg_class, newIndexTuple); + + /* Close relations and clean up */ + heap_close(pg_class, RowExclusiveLock); + + /* The lock taken previously is not released until the end of transaction */ + relation_close(oldIndexRel, NoLock); + relation_close(newIndexRel, NoLock);
It might be worthwile adding a heap_freetuple here for (old,
new)IndexTuple, just to spare the reader the thinking whether it needs
to be done.
+/* + * index_concurrent_drop + * + * Drop a single index concurrently as the last step of an index concurrent + * process Deletion is done through performDeletion or dependencies of the + * index are not dropped. At this point all the indexes are already considered + * as invalid and dead so they can be dropped without using any concurrent + * options. + */
"or dependencies of the index would not get dropped"?
+void +index_concurrent_drop(Oid indexOid) +{ + Oid constraintOid = get_index_constraint(indexOid); + ObjectAddress object; + Form_pg_index indexForm; + Relation pg_index; + HeapTuple indexTuple; + bool indislive; + + /* + * Check that the index dropped here is not alive, it might be used by + * other backends in this case. + */ + pg_index = heap_open(IndexRelationId, RowExclusiveLock); + + indexTuple = SearchSysCacheCopy1(INDEXRELID, + ObjectIdGetDatum(indexOid)); + if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indexOid); + indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + indislive = indexForm->indislive; + + /* Clean up */ + heap_close(pg_index, RowExclusiveLock); + + /* Leave if index is still alive */ + if (indislive) + return;
This seems like a confusing path? Why is it valid to get here with a
valid index and why is it ok to silently ignore that case?
/* + * ReindexRelationConcurrently + * + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be + * either an index or a table. If a table is specified, each reindexing step + * is done in parallel with all the table's indexes as well as its dependent + * toast indexes. + */ +bool +ReindexRelationConcurrently(Oid relationOid) +{ + List *concurrentIndexIds = NIL, + *indexIds = NIL, + *parentRelationIds = NIL, + *lockTags = NIL, + *relationLocks = NIL; + ListCell *lc, *lc2; + Snapshot snapshot; + + /* + * Extract the list of indexes that are going to be rebuilt based on the + * list of relation Oids given by caller. For each element in given list, + * If the relkind of given relation Oid is a table, all its valid indexes + * will be rebuilt, including its associated toast table indexes. If + * relkind is an index, this index itself will be rebuilt. The locks taken + * parent relations and involved indexes are kept until this transaction + * is committed to protect against schema changes that might occur until + * the session lock is taken on each relation. + */ + switch (get_rel_relkind(relationOid)) + { + case RELKIND_RELATION: + { + /* + * In the case of a relation, find all its indexes + * including toast indexes. + */ + Relation heapRelation = heap_open(relationOid, + ShareUpdateExclusiveLock); + + /* Track this relation for session locks */ + parentRelationIds = lappend_oid(parentRelationIds, relationOid); + + /* Relation on which is based index cannot be shared */ + if (heapRelation->rd_rel->relisshared) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("concurrent reindex is not supported for shared relations"))); + + /* Add all the valid indexes of relation to list */ + foreach(lc2, RelationGetIndexList(heapRelation))
Hm. This means we will not notice having about-to-be dropped indexes
around. Which seems safe because locks will prevent that anyway...
+ default: + /* nothing to do */ + break;
Shouldn't we error out?
+ foreach(lc, indexIds) + { + Relation indexRel; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = lfirst_oid(lc2); + bool primary; + + /* Move to next concurrent item */ + lc2 = lnext(lc2);
forboth()
+ /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up with the INSERT that + * might have occurred in the parent table and are marked as valid once done. + * + * We once again wait until no transaction can have the table open with + * the index marked as read-only for updates. Each index validation is done + * with a separate transaction to avoid opening transaction for an + * unnecessary too long time. + */
Maybe I am being dumb because I have the feeling I said differently in
the past, but why do we not need a WaitForMultipleVirtualLocks() here?
The comment seems to say we need to do so.
+ /* + * Perform a scan of each concurrent index with the heap, then insert + * any missing index entries. + */ + foreach(lc, concurrentIndexIds) + { + Oid indOid = lfirst_oid(lc); + Oid relOid; + + /* Open separate transaction to validate index */ + StartTransactionCommand(); + + /* Get the parent relation Oid */ + relOid = IndexGetRelation(indOid, false); + + /* + * Take the reference snapshot that will be used for the concurrent indexes + * validation. + */ + snapshot = RegisterSnapshot(GetTransactionSnapshot()); + PushActiveSnapshot(snapshot); + + /* Validate index, which might be a toast */ + validate_index(relOid, indOid, snapshot); + + /* + * This concurrent index is now valid as they contain all the tuples + * necessary. However, it might not have taken into account deleted tuples + * before the reference snapshot was taken, so we need to wait for the + * transactions that might have older snapshots than ours. + */ + WaitForOldSnapshots(snapshot); + + /* + * Concurrent index can now be marked as valid -- update pg_index + * entries. + */ + index_set_state_flags(indOid, INDEX_CREATE_SET_VALID); + + /* + * The pg_index update will cause backends to update its entries for the + * concurrent index but it is necessary to do the same thing for cache. + */ + CacheInvalidateRelcacheByRelid(relOid); + + /* we can now do away with our active snapshot */ + PopActiveSnapshot(); + + /* And we can remove the validating snapshot too */ + UnregisterSnapshot(snapshot); + + /* Commit this transaction to make the concurrent index valid */ + CommitTransactionCommand(); + }
+ /* + * Phase 5 of REINDEX CONCURRENTLY + * + * The concurrent indexes now hold the old relfilenode of the other indexes + * transactions that might use them. Each operation is performed with a + * separate transaction. + */ + + /* Now mark the concurrent indexes as not ready */ + foreach(lc, concurrentIndexIds) + { + Oid indOid = lfirst_oid(lc); + Oid relOid; + + StartTransactionCommand(); + relOid = IndexGetRelation(indOid, false); + + /* + * Finish the index invalidation and set it as dead. It is not + * necessary to wait for virtual locks on the parent relation as it + * is already sure that this session holds sufficient locks.s + */
tiny typo (lock.s)
+ /* + * Phase 6 of REINDEX CONCURRENTLY + * + * Drop the concurrent indexes. This needs to be done through + * performDeletion or related dependencies will not be dropped for the old + * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used + * as here the indexes are already considered as dead and invalid, so they + * will not be used by other backends. + */ + foreach(lc, concurrentIndexIds) + { + Oid indexOid = lfirst_oid(lc); + + /* Start transaction to drop this index */ + StartTransactionCommand(); + + /* Get fresh snapshot for next step */ + PushActiveSnapshot(GetTransactionSnapshot()); + + /* + * Open transaction if necessary, for the first index treated its + * transaction has been already opened previously. + */ + index_concurrent_drop(indexOid); + + /* + * For the last index to be treated, do not commit transaction yet. + * This will be done once all the locks on indexes and parent relations + * are released. + */
Hm. This doesn't seem to commit the last transaction at all right now?
Not sure why UnlockRelationIdForSession needs to be run in a transaction
anyway?
+ if (indexOid != llast_oid(concurrentIndexIds)) + { + /* We can do away with our snapshot */ + PopActiveSnapshot(); + + /* Commit this transaction to make the update visible. */ + CommitTransactionCommand(); + } + } + + /* + * Last thing to do is release the session-level lock on the parent table + * and the indexes of table. + */ + foreach(lc, relationLocks) + { + LockRelId lockRel = * (LockRelId *) lfirst(lc); + UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock); + } + + return true; +} + +
+ /* + * Check the case of a system index that might have been invalidated by a + * failed concurrent process and allow its drop. + */
This is only possible for toast indexes right now, right? If so, the
comment should mention that.
+ if (IsSystemClass(classform) && + relkind == RELKIND_INDEX) + { + HeapTuple locTuple; + Form_pg_index indexform; + bool indisvalid; + + locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid)); + if (!HeapTupleIsValid(locTuple)) + { + ReleaseSysCache(tuple); + return; + } + + indexform = (Form_pg_index) GETSTRUCT(locTuple); + indisvalid = indexform->indisvalid; + ReleaseSysCache(locTuple); + + /* Leave if index entry is not valid */ + if (!indisvalid) + { + ReleaseSysCache(tuple); + return; + } + } +
Ok, thats what I have for now...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the review. All your comments are addressed and updated patches
are attached.
Please see below for the details, and if you find anything else just let me
know.
On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund <andres@2ndquadrant.com>wrote:
Have you benchmarked the toastrelidx removal stuff in any way? If not,
thats fine, but if yes I'd be interested.
No I haven't. Is it really that easily measurable? I think not, but me too
I'd be interested in looking at such results.
On 2013-03-04 22:33:53 +0900, Michael Paquier wrote:
+ ListCell *lc;
+ int count = 0;I find count a confusing name for a loop iteration variable... i of orr,
idxno, or ...
That's only a matter of personal way of doing... But done for all the
functions I modified in this file.
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
Hm, I think we should move this into a macro, this is cropping up at
more and more places.
This is not necessary. RelationGetIndexList does a check similar at its
top, so I simply removed all those checks.
+ for (count = 0; count < num_indexes; count++) + index_insert(toastidxs[count], t_values, t_isnull, + &(toasttup->t_self), + toastrel, +toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES :
UNIQUE_CHECK_NO);
The indisunique check looks like a copy & pasto to me, albeit not
yours...
Yes it is the same for all the indexes normally, but it looks more solid to
me to do that as it is. So unchanged.
+ /*
+ * We actually use only the first index but taking a lock on all is + * necessary. + */Hm, is it guaranteed that the first index is valid?
Not at all. Fixed. If all the indexes are invalid, an error is returned.
+ * If we're swapping two toast tables by content, do the same for
all of+ * their indexes. The swap can actually be safely done only if all
the indexes
+ * have valid Oids.
What's an index without a valid oid?
That's a good question... I re-read the code and it didn't any sense, so
switched to a check on empty index list for both relations.
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid,
RowExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid,
RowExclusiveLock);
Shouldn't those be Access Exlusive Locks?
Yeah seems better for this swap.
+ /* Obtain index list if necessary */ + if (toastRel1->rd_indexvalid == 0) + RelationGetIndexList(toastRel1); + if (toastRel2->rd_indexvalid == 0) + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toast indexes*/
So there's no error being thrown if this turns out not to be possible?
There are no errors also in the former process... This should fail
silently, no?
+ if (count == 0) + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap); + else + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap,
count);
+ RenameRelationInternal(lfirst_oid(lc), +NewToastName);
+ count++;
+ }Hm. It seems wrong that this layer needs to know about _cct.
Any other idea? For the time being I removed cct and added only a suffix
based on the index number...
/*
- * Calculate total on-disk size of a TOAST relation, including itsindex.
+ * Calculate total on-disk size of a TOAST relation, including its
indexes.
* Must not be applied to non-TOAST relations. */ static int64 @@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid) { ... + /* Size is evaluated based on the first index available */Uh. Why? Imo all indexes should be counted.
They are! The comment only is incorrect. Fixed.
-#define CATALOG_VERSION_NO 201302181 +#define CATALOG_VERSION_NO 20130219Think you forgot a digit here ;)
Fixed.
/*
* This case is currently only supported during a concurrent index
* rebuild, but there is no way to ask for it in the grammar otherwise
* anyway.
*/Or similar.
Makes sense. Thanks.
+ ReleaseSysCache(constTuple); + }Very, very nitpicky, but I find "constTuple" to be confusing, I thought
at first it meant that the tuple shouldn't be modified or something.
Made that clear.
+ /* + * Index is considered as a constraint if it is PRIMARY KEY orEXCLUSION.
+ */ + isconstraint = indexRelation->rd_index->indisprimary || + indexRelation->rd_index->indisexclusion;unique constraints aren't mattering here?
No they are not. Unique indexes are not counted as constraints in the case
of index_create. Previous versions of the patch did that but there are
issues with unique indexes using expressions.
+/* + * index_concurrent_swap + * + * Replace old index by old index in a concurrent context. For the timebeing
+ * what is done here is switching the relation relfilenode of the
indexes. If
+ * extra operations are necessary during a concurrent swap, processing
should
+ * be added here. AccessExclusiveLock is taken on the index relations
that are
+ * swapped until the end of the transaction where this function is
called.
+ */ +void +index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid) +{ + Relation oldIndexRel, newIndexRel, pg_class; + HeapTuple oldIndexTuple, newIndexTuple; + Form_pg_class oldIndexForm, newIndexForm; + Oid tmpnode; + + /* + * Take an exclusive lock on the old and new index before swappingthem.
+ */ + oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock); + newIndexRel = relation_open(newIndexOid, AccessExclusiveLock); + + /* Now swap relfilenode of those indexes */Any chance to reuse swap_relation_files here? Not sure whether it would
be beneficial given that it is more generic and normally works on a
relation level...
Hum. I am not sure. The current way of doing is enough to my mind.
We probably should remove the fsm of the index altogether after this?
The freespace map? Not sure it is necessary here. Isn't it going to be
removed with the relation anyway?
+ /* The lock taken previously is not released until the end of
transaction */+ relation_close(oldIndexRel, NoLock); + relation_close(newIndexRel, NoLock);It might be worthwile adding a heap_freetuple here for (old,
new)IndexTuple, just to spare the reader the thinking whether it needs
to be done.
Indeed, I forgot some cleanup here. Fixed.
+/*
+ * index_concurrent_drop
+ */"or dependencies of the index would not get dropped"?
Fixed.
+void +index_concurrent_drop(Oid indexOid) +{ + Oid constraintOid =get_index_constraint(indexOid);
+ ObjectAddress object; + Form_pg_index indexForm; + Relation pg_index; + HeapTuple indexTuple; + bool indislive; + + /* + * Check that the index dropped here is not alive, it might beused by
+ * other backends in this case. + */ + pg_index = heap_open(IndexRelationId, RowExclusiveLock); + + indexTuple = SearchSysCacheCopy1(INDEXRELID, +ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indexOid); + indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + indislive = indexForm->indislive; + + /* Clean up */ + heap_close(pg_index, RowExclusiveLock); + + /* Leave if index is still alive */ + if (indislive) + return;This seems like a confusing path? Why is it valid to get here with a
valid index and why is it ok to silently ignore that case?
I added that because of a comment of one of the past reviews. Personally I
think it makes more sense to remove that for clarity.
+ case RELKIND_RELATION:
+ { + /* + * In the case of a relation, find all itsindexes
+ * including toast indexes. + */ + Relation heapRelation =heap_open(relationOid,
+
ShareUpdateExclusiveLock);
Hm. This means we will not notice having about-to-be dropped indexes
around. Which seems safe because locks will prevent that anyway...
I think that's OK as-is.
+ default:
+ /* nothing to do */ + break;Shouldn't we error out?
Don't think so. For example what if the relation is a matview? For REINDEX
DATABASE this could finish as an error because a materialized view is
listed as a relation to reindex. I prefer having this path failing silently
and leave if there are no indexes.
+ foreach(lc, indexIds) + { + Relation indexRel; + Oid indOid = lfirst_oid(lc); + Oid concurrentOid = lfirst_oid(lc2); + bool primary; + + /* Move to next concurrent item */ + lc2 = lnext(lc2);forboth()
Oh, I didn't know this trick. Thanks.
+ /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up with theINSERT that
+ * might have occurred in the parent table and are marked as valid
once done.
+ * + * We once again wait until no transaction can have the table openwith
+ * the index marked as read-only for updates. Each index
validation is done
+ * with a separate transaction to avoid opening transaction for an + * unnecessary too long time. + */Maybe I am being dumb because I have the feeling I said differently in
the past, but why do we not need a WaitForMultipleVirtualLocks() here?
The comment seems to say we need to do so.
Yes you said the contrary in a previous review. The purpose of this
function is to first gather the locks and then wait for everything at once
to reduce possible conflicts.
+ /* + * Finish the index invalidation and set it as dead. It isnot
+ * necessary to wait for virtual locks on the parent
relation as it
+ * is already sure that this session holds sufficient
locks.s
+ */
tiny typo (lock.s)
Fixed.
+ /* + * Phase 6 of REINDEX CONCURRENTLY + * + * Drop the concurrent indexes. This needs to be done through + * performDeletion or related dependencies will not be dropped forthe old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is
not used
+ * as here the indexes are already considered as dead and invalid,
so they
+ * will not be used by other backends. + */ + foreach(lc, concurrentIndexIds) + { + Oid indexOid = lfirst_oid(lc); + + /* Start transaction to drop this index */ + StartTransactionCommand(); + + /* Get fresh snapshot for next step */ + PushActiveSnapshot(GetTransactionSnapshot()); + + /* + * Open transaction if necessary, for the first indextreated its
+ * transaction has been already opened previously. + */ + index_concurrent_drop(indexOid); + + /* + * For the last index to be treated, do not committransaction yet.
+ * This will be done once all the locks on indexes and
parent relations
+ * are released.
+ */Hm. This doesn't seem to commit the last transaction at all right now?
It is better like this. The end of the process needs to be done inside a
transaction, so not committing immediately the last drop makes sense, no?
Not sure why UnlockRelationIdForSession needs to be run in a transaction
anyway?
Even in the case of CREATE INDEX CONCURRENTLY, UnlockRelationIdForSession
is run inside a transaction block.
+ /*
+ * Check the case of a system index that might have been
invalidated by a
+ * failed concurrent process and allow its drop. + */This is only possible for toast indexes right now, right? If so, the
comment should mention that.
Yes, fixed. I mentioned that in the comment.
--
Michael
Attachments:
20130305_2_reindex_concurrently_v17.patchapplication/octet-stream; name=20130305_2_reindex_concurrently_v17.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..051ebd7 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,112 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the index used during process is
+ marked as invalid in a third transaction. Finally two additional
+ transactions are used to mark the concurrent index as not ready and then
+ drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +386,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..2209942 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1095,6 +1105,425 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool isconstraint;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ int i;
+ char buf[NAMEDATALEN];
+
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (i = 1;; i++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", i);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /*
+ * Index is considered as a constraint if it is PRIMARY KEY or EXCLUSION.
+ */
+ isconstraint = indexRelation->rd_index->indisprimary ||
+ indexRelation->rd_index->indisexclusion;
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ isconstraint, /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+ bool indislive;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ indislive = indexForm->indislive;
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1753,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1834,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1863,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1877,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..d74e7d0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,27 +681,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,74 +740,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,509 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Nothing to do */
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /*
+ * For the last index to be treated, do not commit transaction yet.
+ * This will be done once all the locks on indexes and parent relations
+ * are released.
+ */
+ if (indexOid != llast_oid(concurrentIndexIds))
+ {
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1940,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1967,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2086,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2165,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2210,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2225,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2317,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0d6f5c0..e11f3f6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..db2a531 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..88ec81a 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,54 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a0b2ae2 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,39 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
20130305_1_remove_reltoastidxid_v3.patchapplication/octet-stream; name=20130305_1_remove_reltoastidxid_v3.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 81c1be3..e1475e6 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..37dc3e9 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1337,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1397,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1436,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1464,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1491,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
+ bool found = false;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,38 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ toastidxs[i] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* If index is valid register it, it will be used for next processes */
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ found = true;
+ validtoastidx = toastidxs[i];
+ }
+ i++;
+ }
+
+ /* This should not happen, but check the case of no valid indexes */
+ if (!found)
+ {
+ /* No valid indexes found, so leave with an error */
+ elog(ERROR, "no valid indexes found for toast relation %s",
+ toast_pointer.va_valueid,
+ RelationGetRelationName(toastrel));
+ }
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1554,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1568,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1588,9 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1600,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1647,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1662,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1680,17 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1709,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1798,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1817,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1840,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1885,17 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1936,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2033,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0ecfc78..043b279 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..e73bf55 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,51 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (toastRel1->rd_indexlist != NIL &&
+ toastRel2->rd_indexlist != NIL &&
+ list_length(toastRel1->rd_indexlist) == list_length(toastRel2->rd_indexlist))
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1545,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1560,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries are thought as being concurrent indexes.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..0d6f5c0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8678,7 +8678,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8686,6 +8685,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8729,7 +8730,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8808,8 +8810,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 8963266..3dd2fda 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -577,8 +577,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -590,7 +590,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e6c85ac..f15e6a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index ab91ab0..81f049b 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201302181
+#define CATALOG_VERSION_NO 201303051
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Thanks for the review. All your comments are addressed and updated patches
are attached.
I got the compile warnings:
tuptoaster.c:1539: warning: format '%s' expects type 'char *', but
argument 3 has type 'Oid'
tuptoaster.c:1539: warning: too many arguments for format
The patch doesn't handle the index on the materialized view correctly.
=# CREATE TABLE hoge (i int);
CREATE TABLE
=# CREATE MATERIALIZED VIEW hogeview AS SELECT * FROM hoge;
SELECT 0
=# CREATE INDEX hogeview_idx ON hogeview(i);
CREATE INDEX
=# REINDEX TABLE hogeview;
REINDEX
=# REINDEX TABLE CONCURRENTLY hogeview;
NOTICE: table "hogeview" has no indexes
REINDEX
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
Thanks for the review. All your comments are addressed and updated patches
are attached.
Please see below for the details, and if you find anything else just let me
know.On Tue, Mar 5, 2013 at 6:27 PM, Andres Freund <andres@2ndquadrant.com>wrote:
Have you benchmarked the toastrelidx removal stuff in any way? If not,
thats fine, but if yes I'd be interested.No I haven't. Is it really that easily measurable? I think not, but me too
I'd be interested in looking at such results.
I don't think its really measurable, at least not for modifications. But
istm that the onus to proof that to some degree is upon the patch.
+ if (toastrel->rd_indexvalid == 0)
+ RelationGetIndexList(toastrel);
Hm, I think we should move this into a macro, this is cropping up at
more and more places.This is not necessary. RelationGetIndexList does a check similar at its
top, so I simply removed all those checks.
Well, in some of those cases a function call might be noticeable
(probably only in the toast fetch path). Thats why I suggested putting
the above in a macro...
+ for (count = 0; count < num_indexes; count++) + index_insert(toastidxs[count], t_values, t_isnull, + &(toasttup->t_self), + toastrel, +toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES :
UNIQUE_CHECK_NO);
The indisunique check looks like a copy & pasto to me, albeit not
yours...Yes it is the same for all the indexes normally, but it looks more solid to
me to do that as it is. So unchanged.
Hm, if the toast indexes aren't unique anymore loads of stuff would be
broken. Anyway, not your "fault".
+ /* Obtain index list if necessary */ + if (toastRel1->rd_indexvalid == 0) + RelationGetIndexList(toastRel1); + if (toastRel2->rd_indexvalid == 0) + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toast indexes*/
So there's no error being thrown if this turns out not to be possible?
There are no errors also in the former process... This should fail
silently, no?
Not sure what you mean by "former process"? So far I don't see any
reason why it would be a good idea to fail silently. We end up with
corrupt data if the swap is silently not performed.
+ if (count == 0) + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap); + else + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap,
count);
+ RenameRelationInternal(lfirst_oid(lc), +NewToastName);
+ count++;
+ }Hm. It seems wrong that this layer needs to know about _cct.
Any other idea? For the time being I removed cct and added only a suffix
based on the index number...
Hm. It seems like throwing an error would be sufficient, that path is
only entered for shared catalogs, right? Having multiple toast indexes
would be a bug.
+ /* + * Index is considered as a constraint if it is PRIMARY KEY orEXCLUSION.
+ */ + isconstraint = indexRelation->rd_index->indisprimary || + indexRelation->rd_index->indisexclusion;unique constraints aren't mattering here?
No they are not. Unique indexes are not counted as constraints in the case
of index_create. Previous versions of the patch did that but there are
issues with unique indexes using expressions.
Hm. index_create's comment says:
* isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION constraint
There are unique indexes that are constraints and some that are
not. Looking at ->indisunique is not sufficient to determine whether its
one or not.
We probably should remove the fsm of the index altogether after this?
The freespace map? Not sure it is necessary here. Isn't it going to be
removed with the relation anyway?
I had a thinko here, forgot what I said. I thought the freespacemap
would be the one from the old index, but htats clearly bogus. Comes from
writing reviews after having to leave home at 5 in the morning to catch
a plane ;)
+void +index_concurrent_drop(Oid indexOid) +{ + Oid constraintOid =get_index_constraint(indexOid);
+ ObjectAddress object; + Form_pg_index indexForm; + Relation pg_index; + HeapTuple indexTuple; + bool indislive; + + /* + * Check that the index dropped here is not alive, it might beused by
+ * other backends in this case. + */ + pg_index = heap_open(IndexRelationId, RowExclusiveLock); + + indexTuple = SearchSysCacheCopy1(INDEXRELID, +ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u", indexOid); + indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + indislive = indexForm->indislive; + + /* Clean up */ + heap_close(pg_index, RowExclusiveLock); + + /* Leave if index is still alive */ + if (indislive) + return;This seems like a confusing path? Why is it valid to get here with a
valid index and why is it ok to silently ignore that case?I added that because of a comment of one of the past reviews. Personally I
think it makes more sense to remove that for clarity.
Imo it should be an elog(ERROR) or an Assert().
+ case RELKIND_RELATION:
+ { + /* + * In the case of a relation, find all itsindexes
+ * including toast indexes. + */ + Relation heapRelation =heap_open(relationOid,
+
ShareUpdateExclusiveLock);
Hm. This means we will not notice having about-to-be dropped indexes
around. Which seems safe because locks will prevent that anyway...I think that's OK as-is.
Yes. Just thinking out loud.
+ default:
+ /* nothing to do */ + break;Shouldn't we error out?
Don't think so. For example what if the relation is a matview? For REINDEX
DATABASE this could finish as an error because a materialized view is
listed as a relation to reindex. I prefer having this path failing silently
and leave if there are no indexes.
Imo default fallthroughs makes it harder to adjust code. And afaik its
legal to add indexes to materialized views which kinda proofs my point.
And if that path is reached for plain views, sequences or toast tables
its an error.
+ /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up with theINSERT that
+ * might have occurred in the parent table and are marked as valid
once done.
+ * + * We once again wait until no transaction can have the table openwith
+ * the index marked as read-only for updates. Each index
validation is done
+ * with a separate transaction to avoid opening transaction for an + * unnecessary too long time. + */Maybe I am being dumb because I have the feeling I said differently in
the past, but why do we not need a WaitForMultipleVirtualLocks() here?
The comment seems to say we need to do so.Yes you said the contrary in a previous review. The purpose of this
function is to first gather the locks and then wait for everything at once
to reduce possible conflicts.
you say:
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
Which doesn't seem to be done?
I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
not being necessary in this phase. Which I think is correct. Anyway, if
I claimed otherwise, I think I was wrong:
The reason - I think - we need to wait here is that otherwise its not
guaranteed that all other backends see the index with ->isready
set. Which means they might add tuples which are invisible to the mvcc
snapshot passed to validate_index() (just created beforehand) which are
not yet added to the new index because those backends think the index is
not ready yet.
Any flaws in that logic?
...
Yes, reading the comments of validate_index() and the old implementation
seems to make my point.
+ /* + * Phase 6 of REINDEX CONCURRENTLY + * + * Drop the concurrent indexes. This needs to be done through + * performDeletion or related dependencies will not be dropped forthe old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is
not used
+ * as here the indexes are already considered as dead and invalid,
so they
+ * will not be used by other backends. + */ + foreach(lc, concurrentIndexIds) + { + Oid indexOid = lfirst_oid(lc); + + /* Start transaction to drop this index */ + StartTransactionCommand(); + + /* Get fresh snapshot for next step */ + PushActiveSnapshot(GetTransactionSnapshot()); + + /* + * Open transaction if necessary, for the first indextreated its
+ * transaction has been already opened previously. + */ + index_concurrent_drop(indexOid); + + /* + * For the last index to be treated, do not committransaction yet.
+ * This will be done once all the locks on indexes and
parent relations
+ * are released.
+ */Hm. This doesn't seem to commit the last transaction at all right now?
It is better like this. The end of the process needs to be done inside a
transaction, so not committing immediately the last drop makes sense, no?
I pretty much dislike this. If we need to leave a transaction open
(why?), that should happen a function layer above.
Not sure why UnlockRelationIdForSession needs to be run in a transaction
anyway?Even in the case of CREATE INDEX CONCURRENTLY, UnlockRelationIdForSession
is run inside a transaction block.
I have no problem of doing so, I just dislike the way thats done in the
loop. You can just open a new one if its required, a transaction is
cheap, especially if it doesn't even acquire an xid.
Looking good.
I'll do some actual testing instead of just reviewing now...
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 5, 2013 at 11:22 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Tue, Mar 5, 2013 at 10:35 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Thanks for the review. All your comments are addressed and updated
patches
are attached.
I got the compile warnings:
tuptoaster.c:1539: warning: format '%s' expects type 'char *', but
argument 3 has type 'Oid'
tuptoaster.c:1539: warning: too many arguments for format
Fixed. Thanks for catching that.
The patch doesn't handle the index on the materialized view correctly.
Hehe... I didn't know that materialized views could have indexes...
I fixed it, will send updated patch once I am done with Andres' comments.
--
Michael
Please find attached updated patch realigned with your comments. You can
find my answers inline...
The only thing that needs clarification is the comment about
UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are
corrected or adapted to what you wanted. I am also including now tests for
matviews.
On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values,
t_isnull,
+ &(toasttup->t_self), + toastrel, +toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES :
UNIQUE_CHECK_NO);
The indisunique check looks like a copy & pasto to me, albeit not
yours...Yes it is the same for all the indexes normally, but it looks more solid
to
me to do that as it is. So unchanged.
Hm, if the toast indexes aren't unique anymore loads of stuff would be
broken. Anyway, not your "fault".
I definitely cannot understand where you are going here. Could you be more
explicit? Why could this be a problem? Without my patch a similar check is
used for toast indexes.
+ /* Obtain index list if necessary */ + if (toastRel1->rd_indexvalid == 0) + RelationGetIndexList(toastRel1); + if (toastRel2->rd_indexvalid == 0) + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toastindexes
*/
So there's no error being thrown if this turns out not to be possible?
There are no errors also in the former process... This should fail
silently, no?Not sure what you mean by "former process"? So far I don't see any
reason why it would be a good idea to fail silently. We end up with
corrupt data if the swap is silently not performed.
OK added an error and a check on the size of rd_indexlist to make things
better suited.
+ if (count == 0) + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap); + else + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap,
count);
+ RenameRelationInternal(lfirst_oid(lc), +NewToastName);
+ count++;
+ }Hm. It seems wrong that this layer needs to know about _cct.
Any other idea? For the time being I removed cct and added only a suffix
based on the index number...Hm. It seems like throwing an error would be sufficient, that path is
only entered for shared catalogs, right? Having multiple toast indexes
would be a bug.
Don't think so. Even if now those APIs are used only for catalog tables, I
do not believe that this function has been designed to be used only with
shared catalogs. Removing the cct suffix makes sense though...
+ /* + * Index is considered as a constraint if it is PRIMARY KEY orEXCLUSION.
+ */ + isconstraint = indexRelation->rd_index->indisprimary || + indexRelation->rd_index->indisexclusion;unique constraints aren't mattering here?
No they are not. Unique indexes are not counted as constraints in the
case
of index_create. Previous versions of the patch did that but there are
issues with unique indexes using expressions.Hm. index_create's comment says:
* isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION
constraintThere are unique indexes that are constraints and some that are
not. Looking at ->indisunique is not sufficient to determine whether its
one or not.
Hum... OK. I changed that using a method based on get_index_constraint for
a given index. So if the constraint Oid is invalid, it means that this
index has no constraints and its concurrent entry won't create an index in
consequence. It is more stable this way.
+void
+index_concurrent_drop(Oid indexOid) +{ + Oid constraintOid =get_index_constraint(indexOid);
+ ObjectAddress object; + Form_pg_index indexForm; + Relation pg_index; + HeapTuple indexTuple; + bool indislive; + + /* + * Check that the index dropped here is not alive, it might beused by
+ * other backends in this case. + */ + pg_index = heap_open(IndexRelationId, RowExclusiveLock); + + indexTuple = SearchSysCacheCopy1(INDEXRELID, +ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple)) + elog(ERROR, "cache lookup failed for index %u",indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple); + indislive = indexForm->indislive; + + /* Clean up */ + heap_close(pg_index, RowExclusiveLock); + + /* Leave if index is still alive */ + if (indislive) + return;This seems like a confusing path? Why is it valid to get here with a
valid index and why is it ok to silently ignore that case?I added that because of a comment of one of the past reviews. Personally
I
think it makes more sense to remove that for clarity.
Imo it should be an elog(ERROR) or an Assert().
Assert. Added.
+ default:
+ /* nothing to do */ + break;Shouldn't we error out?
Don't think so. For example what if the relation is a matview? For
REINDEX
DATABASE this could finish as an error because a materialized view is
listed as a relation to reindex. I prefer having this path failingsilently
and leave if there are no indexes.
Imo default fallthroughs makes it harder to adjust code. And afaik its
legal to add indexes to materialized views which kinda proofs my point.
And if that path is reached for plain views, sequences or toast tables
its an error.
Added an error message. Matviews are now correctly handled (per se report
from Masao).
+ /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up with theINSERT that
+ * might have occurred in the parent table and are marked as
valid
once done.
+ * + * We once again wait until no transaction can have the tableopen
with
+ * the index marked as read-only for updates. Each index
validation is done
+ * with a separate transaction to avoid opening transaction
for an
+ * unnecessary too long time.
+ */Maybe I am being dumb because I have the feeling I said differently in
the past, but why do we not need a WaitForMultipleVirtualLocks() here?
The comment seems to say we need to do so.Yes you said the contrary in a previous review. The purpose of this
function is to first gather the locks and then wait for everything atonce
to reduce possible conflicts.
you say:
+ * We once again wait until no transaction can have the table open with + * the index marked as read-only for updates. Each index validation is done + * with a separate transaction to avoid opening transaction for an + * unnecessary too long time.Which doesn't seem to be done?
I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
not being necessary in this phase. Which I think is correct.
Regarding CacheInvalidateRelcacheByRelid at phase 3, I think that it is
needed. If we don't use it the pg_index entries will be updated but not the
cache, what is incorrect.
Anyway, if I claimed otherwise, I think I was wrong:
The reason - I think - we need to wait here is that otherwise its not
guaranteed that all other backends see the index with ->isready
set. Which means they might add tuples which are invisible to the mvcc
snapshot passed to validate_index() (just created beforehand) which are
not yet added to the new index because those backends think the index is
not ready yet.
Any flaws in that logic?
Not that I think. In consequence, and I think we will agree on that: I am
removing WaitForMultipleVirtualLocks and add a WaitForVirtualLock on the
parent relation for EACH index before building and validating it.
It is better like this. The end of the process needs to be done inside a
transaction, so not committing immediately the last drop makes sense, no?
I pretty much dislike this. If we need to leave a transaction open
(why?), that should happen a function layer above.
Changed as requested.
Not sure why UnlockRelationIdForSession needs to be run in a transaction
anyway?
Even in the case of CREATE INDEX CONCURRENTLY,
UnlockRelationIdForSession
is run inside a transaction block.
I have no problem of doing so, I just dislike the way thats done in the
loop. You can just open a new one if its required, a transaction is
cheap, especially if it doesn't even acquire an xid.
OK. Doing the end of the transaction in a separate transaction and doing
the unlocking out of the transaction block...
--
Michael
Attachments:
20130306_1_remove_reltoastidxid_v4.patchapplication/octet-stream; name=20130306_1_remove_reltoastidxid_v4.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 81c1be3..e1475e6 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..79af64f 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1337,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1397,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1436,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1464,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1491,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
+ bool found = false;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,37 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ toastidxs[i] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* If index is valid register it, it will be used for next processes */
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ found = true;
+ validtoastidx = toastidxs[i];
+ }
+ i++;
+ }
+
+ /* This should not happen, but check the case of no valid indexes */
+ if (!found)
+ {
+ /* No valid indexes found, so leave with an error */
+ elog(ERROR, "no valid indexes found for toast relation %s",
+ RelationGetRelationName(toastrel));
+ }
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1553,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1567,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1587,9 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1599,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1646,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1661,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1679,17 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1708,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1797,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1816,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1839,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1884,17 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1935,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2032,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0ecfc78..043b279 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..d3e1da4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,61 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1555,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1570,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..0d6f5c0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8678,7 +8678,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8686,6 +8685,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8729,7 +8730,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8808,8 +8810,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 8963266..3dd2fda 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -577,8 +577,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -590,7 +590,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e6c85ac..f15e6a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130306_2_reindex_concurrently_v18.patchapplication/octet-stream; name=20130306_2_reindex_concurrently_v18.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..051ebd7 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,112 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the index used during process is
+ marked as invalid in a third transaction. Finally two additional
+ transactions are used to mark the concurrent index as not ready and then
+ drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +386,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..b2895f2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1095,6 +1105,416 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ Assert(!indexForm->indislive);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1744,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1825,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1854,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1868,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..28bc217 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,27 +681,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,74 +740,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,539 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+ bool primary;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+ WaitForVirtualLocks(*heapLockTag, ShareLock);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /*
+ * The pg_index update will cause backends to update its entries for the
+ * concurrent index but it is necessary to do the same thing for cache.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1970,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1997,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2116,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2195,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2240,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2255,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2347,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0d6f5c0..e11f3f6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..12e3145 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,118 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to this lock.
+ * To do this, inquire which xacts currently would conflict with the lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForVirtualLocks(LOCKTAG locktag, LOCKMODE lockmode)
+{
+ VirtualTransactionId *old_lockholders;
+
+ old_lockholders = GetLockConflicts(&locktag, lockmode);
+
+ while (VirtualTransactionIdIsValid(*old_lockholders))
+ {
+ VirtualXactLock(*old_lockholders, true);
+ old_lockholders++;
+ }
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..db2a531 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..0b591ce 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,7 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-03-06 13:21:27 +0900, Michael Paquier wrote:
Please find attached updated patch realigned with your comments. You can
find my answers inline...
The only thing that needs clarification is the comment about
UNIQUE_CHECK_YES/UNIQUE_CHECK_NO. Except that all the other things are
corrected or adapted to what you wanted. I am also including now tests for
matviews.On Wed, Mar 6, 2013 at 1:49 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-05 22:35:16 +0900, Michael Paquier wrote:
+ for (count = 0; count < num_indexes; count++)
+ index_insert(toastidxs[count], t_values,
t_isnull,
+ &(toasttup->t_self), + toastrel, +toastidxs[count]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES :
UNIQUE_CHECK_NO);
The indisunique check looks like a copy & pasto to me, albeit not
yours...Yes it is the same for all the indexes normally, but it looks more solid
to
me to do that as it is. So unchanged.
Hm, if the toast indexes aren't unique anymore loads of stuff would be
broken. Anyway, not your "fault".I definitely cannot understand where you are going here. Could you be more
explicit? Why could this be a problem? Without my patch a similar check is
used for toast indexes.
There's no problem. I just dislike the pointless check which caters for
a situation that doesn't exist...
Forget it, sorry.
+ if (count == 0) + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap); + else + snprintf(NewToastName,NAMEDATALEN, "pg_toast_%u_index_cct%d",
+ OIDOldHeap,
count);
+ RenameRelationInternal(lfirst_oid(lc), +NewToastName);
+ count++;
+ }Hm. It seems wrong that this layer needs to know about _cct.
Any other idea? For the time being I removed cct and added only a suffix
based on the index number...Hm. It seems like throwing an error would be sufficient, that path is
only entered for shared catalogs, right? Having multiple toast indexes
would be a bug.Don't think so. Even if now those APIs are used only for catalog tables, I
do not believe that this function has been designed to be used only with
shared catalogs. Removing the cct suffix makes sense though...
Forget what I said.
+ /* + * Index is considered as a constraint if it is PRIMARY KEY orEXCLUSION.
+ */ + isconstraint = indexRelation->rd_index->indisprimary || + indexRelation->rd_index->indisexclusion;unique constraints aren't mattering here?
No they are not. Unique indexes are not counted as constraints in the
case
of index_create. Previous versions of the patch did that but there are
issues with unique indexes using expressions.Hm. index_create's comment says:
* isconstraint: index is owned by PRIMARY KEY, UNIQUE, or EXCLUSION
constraintThere are unique indexes that are constraints and some that are
not. Looking at ->indisunique is not sufficient to determine whether its
one or not.Hum... OK. I changed that using a method based on get_index_constraint for
a given index. So if the constraint Oid is invalid, it means that this
index has no constraints and its concurrent entry won't create an index in
consequence. It is more stable this way.
Sounds good. Just to make that clear:
To get a unique index without constraint:
CREATE TABLE table_u(id int, data int);
CREATE UNIQUE INDEX table_u__data ON table_u(data);
To get a constraint:
ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id);
+ /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up with theINSERT that
+ * might have occurred in the parent table and are marked as
valid
once done.
+ * + * We once again wait until no transaction can have the tableopen
with
+ * the index marked as read-only for updates. Each index
validation is done
+ * with a separate transaction to avoid opening transaction
for an
+ * unnecessary too long time.
+ */Maybe I am being dumb because I have the feeling I said differently in
the past, but why do we not need a WaitForMultipleVirtualLocks() here?
The comment seems to say we need to do so.Yes you said the contrary in a previous review. The purpose of this
function is to first gather the locks and then wait for everything atonce
to reduce possible conflicts.
you say:
+ * We once again wait until no transaction can have the table open with + * the index marked as read-only for updates. Each index validation is done + * with a separate transaction to avoid opening transaction for an + * unnecessary too long time.Which doesn't seem to be done?
I read back and afaics I only referred to CacheInvalidateRelcacheByRelid
not being necessary in this phase. Which I think is correct.Regarding CacheInvalidateRelcacheByRelid at phase 3, I think that it is
needed. If we don't use it the pg_index entries will be updated but not the
cache, what is incorrect.
A heap_update will cause cache invalidations to be sent.
Anyway, if I claimed otherwise, I think I was wrong:
The reason - I think - we need to wait here is that otherwise its not
guaranteed that all other backends see the index with ->isready
set. Which means they might add tuples which are invisible to the mvcc
snapshot passed to validate_index() (just created beforehand) which are
not yet added to the new index because those backends think the index is
not ready yet.
Any flaws in that logic?Not that I think. In consequence, and I think we will agree on that: I am
removing WaitForMultipleVirtualLocks and add a WaitForVirtualLock on the
parent relation for EACH index before building and validating it.
I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.
20130305_2_reindex_concurrently_v17.patch:
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
Thanks!
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
OK. Patches updated... Please see attached.
With all the work done on those patches, I suppose this is close to being
something clean...
On Wed, Mar 6, 2013 at 5:50 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-06 13:21:27 +0900, Michael Paquier wrote:
Hum... OK. I changed that using a method based on get_index_constraint
for
a given index. So if the constraint Oid is invalid, it means that this
index has no constraints and its concurrent entry won't create an indexin
consequence. It is more stable this way.
Sounds good. Just to make that clear:
To get a unique index without constraint:
CREATE TABLE table_u(id int, data int);
CREATE UNIQUE INDEX table_u__data ON table_u(data);
To get a constraint:
ALTER TABLE table_u ADD CONSTRAINT table_u__id_unique UNIQUE(id);
OK no problem. Thanks for the clarification.
+ /* + * Phase 3 of REINDEX CONCURRENTLY + * + * During this phase the concurrent indexes catch up withthe
INSERT that
+ * might have occurred in the parent table and are marked
as
valid
once done.
+ * + * We once again wait until no transaction can have thetable
open
with
+ * the index marked as read-only for updates. Each index
validation is done
+ * with a separate transaction to avoid opening transaction
for an
+ * unnecessary too long time.
+ */Maybe I am being dumb because I have the feeling I said
differently in
the past, but why do we not need a WaitForMultipleVirtualLocks()
here?
The comment seems to say we need to do so.
Yes you said the contrary in a previous review. The purpose of this
function is to first gather the locks and then wait for everything atonce
to reduce possible conflicts.
you say:
+ * We once again wait until no transaction can have the table
open
with + * the index marked as read-only for updates. Each index validation is done + * with a separate transaction to avoid opening transactionfor an
+ * unnecessary too long time.
Which doesn't seem to be done?
I read back and afaics I only referred to
CacheInvalidateRelcacheByRelid
not being necessary in this phase. Which I think is correct.
Regarding CacheInvalidateRelcacheByRelid at phase 3, I think that it is
needed. If we don't use it the pg_index entries will be updated but notthe
cache, what is incorrect.
A heap_update will cause cache invalidations to be sent.
Ok. removed it.
Anyway, if I claimed otherwise, I think I was wrong:
The reason - I think - we need to wait here is that otherwise its not
guaranteed that all other backends see the index with ->isready
set. Which means they might add tuples which are invisible to the mvcc
snapshot passed to validate_index() (just created beforehand) which are
not yet added to the new index because those backends think the indexis
not ready yet.
Any flaws in that logic?Not that I think. In consequence, and I think we will agree on that: I am
removing WaitForMultipleVirtualLocks and add a WaitForVirtualLock on the
parent relation for EACH index before building and validating it.I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.
OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks
also before phase 3.
Honestly, I am still not very comfortable with the fact that the ShareLock
wait on parent relation is done outside each index transaction for build
and validation... Changed as requested though...
--
Michael
Attachments:
20130306_1_remove_reltoastidxid_v4.patchapplication/octet-stream; name=20130306_1_remove_reltoastidxid_v4.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 81c1be3..e1475e6 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..79af64f 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1337,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1397,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1436,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1464,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1491,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
+ bool found = false;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,37 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ toastidxs[i] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* If index is valid register it, it will be used for next processes */
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ found = true;
+ validtoastidx = toastidxs[i];
+ }
+ i++;
+ }
+
+ /* This should not happen, but check the case of no valid indexes */
+ if (!found)
+ {
+ /* No valid indexes found, so leave with an error */
+ elog(ERROR, "no valid indexes found for toast relation %s",
+ RelationGetRelationName(toastrel));
+ }
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1553,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1567,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1587,9 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1599,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1646,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1661,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1679,17 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1708,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1797,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1816,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1839,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1884,17 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1935,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2032,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0ecfc78..043b279 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..d3e1da4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,61 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1555,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1570,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..0d6f5c0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8678,7 +8678,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8686,6 +8685,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8729,7 +8730,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8808,8 +8810,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 8963266..3dd2fda 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -577,8 +577,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -590,7 +590,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e6c85ac..f15e6a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130306_2_reindex_concurrently_v19.patchapplication/octet-stream; name=20130306_2_reindex_concurrently_v19.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..051ebd7 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,12 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
</para>
</listitem>
@@ -139,6 +142,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +249,112 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the index used during process is
+ marked as invalid in a third transaction. Finally two additional
+ transactions are used to mark the concurrent index as not ready and then
+ drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +386,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..b2895f2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1095,6 +1105,416 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ Assert(!indexForm->indislive);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1744,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1825,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1854,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1868,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..928d6c4 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,27 +681,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,74 +740,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,510 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Mark the old index as invalid */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1941,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1968,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2087,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2166,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2211,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2226,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2318,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0d6f5c0..e11f3f6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..db2a531 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
OK. Patches updated... Please see attached.
With all the work done on those patches, I suppose this is close to being
something clean...
Yes, its looking good. There are loads of improvements possible but
those can very well be made incrementally.
I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.OK, sorry for the confusion. I added a call to WaitForMultipleVirtualLocks
also before phase 3.
Honestly, I am still not very comfortable with the fact that the ShareLock
wait on parent relation is done outside each index transaction for build
and validation... Changed as requested though...
Could you detail your concerns a bit? I tried to think it through
multiple times now and I still can't see a problem. The lock only
ensures that nobody has the relation open with the old index definition
in mind...
Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
OK. Patches updated... Please see attached.
With all the work done on those patches, I suppose this is close to being
something clean...Yes, its looking good. There are loads of improvements possible but
those can very well be made incrementally.I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.OK, sorry for the confusion. I added a call to
WaitForMultipleVirtualLocks
also before phase 3.
Honestly, I am still not very comfortable with the fact that theShareLock
wait on parent relation is done outside each index transaction for build
and validation... Changed as requested though...Could you detail your concerns a bit? I tried to think it through
multiple times now and I still can't see a problem. The lock only
ensures that nobody has the relation open with the old index definition
in mind...
I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
wait is made inside the build and validation transactions. Was there any
particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
transaction block?
That's my only concern.
--
Michael
On 2013-03-06 21:19:57 +0900, Michael Paquier wrote:
On Wed, Mar 6, 2013 at 9:09 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-06 20:59:37 +0900, Michael Paquier wrote:
OK. Patches updated... Please see attached.
With all the work done on those patches, I suppose this is close to being
something clean...Yes, its looking good. There are loads of improvements possible but
those can very well be made incrementally.I have the feeling we are talking past each other. Unless I miss
something *there is no* WaitForMultipleVirtualLocks between phase 2 and
3. But one WaitForMultipleVirtualLocks for all would be totally
sufficient.OK, sorry for the confusion. I added a call to
WaitForMultipleVirtualLocks
also before phase 3.
Honestly, I am still not very comfortable with the fact that theShareLock
wait on parent relation is done outside each index transaction for build
and validation... Changed as requested though...Could you detail your concerns a bit? I tried to think it through
multiple times now and I still can't see a problem. The lock only
ensures that nobody has the relation open with the old index definition
in mind...I am making a comparison with CREATE INDEX CONCURRENTLY where the ShareLock
wait is made inside the build and validation transactions. Was there any
particular reason why CREATE INDEX CONCURRENTLY wait is done inside a
transaction block?
That's my only concern.
Well, it needs to be executed in a transaction because it needs a valid
resource owner and a previous CommitTransactionCommand() will leave that
at NULL. And there is no reason in the single-index case of CREATE INDEX
CONCURRENTLY to do it in a separate transaction.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
OK. Patches updated... Please see attached.
I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
I found that the index which was not marked as INVALID remained unexpectedly.
=# CREATE TABLE hoge (i int primary key);
CREATE TABLE
=# INSERT INTO hoge VALUES (generate_series(1,10));
INSERT 0 10
=# SET statement_timeout TO '1s';
SET
=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-03-07 02:09:49 +0900, Fujii Masao wrote:
On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:OK. Patches updated... Please see attached.
I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
I found that the index which was not marked as INVALID remained unexpectedly.
Thats to be expected. Indexes need to be valid *before* we can drop the
old one. So if you abort in the right moment you will see those and
thats imo fine.
=# CREATE TABLE hoge (i int primary key);
CREATE TABLE
=# INSERT INTO hoge VALUES (generate_series(1,10));
INSERT 0 10
=# SET statement_timeout TO '1s';
SET
=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
Huh, why did that go through? It should have errored out?
+ The recommended recovery method in such cases is to drop the concurrent + index and try again to perform <command>REINDEX CONCURRENTLY</>.If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.
I think we just shouldn't set ->isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)Huh, why did that go through? It should have errored out?
I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.
+ The recommended recovery method in such cases is to drop the concurrent + index and try again to perform <command>REINDEX CONCURRENTLY</>.If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.I think we just shouldn't set ->isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.
Sounds good. But, what about other constraint case like unique constraint?
Those other cases also can be resolved by not setting ->isprimary?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-03-07 02:34:54 +0900, Fujii Masao wrote:
On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)Huh, why did that go through? It should have errored out?
I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.
Hm. Yea.
I am still not sure yet why hoge_pkey_cct_cct sprung into existance, but
that hoge_pkey_cct1 springs into existance makes sense.
I see a problem here, there is a moment here between phase 3 and 4 where
both the old and the new indexes are valid and ready. Thats not good
because if we abort in that moment we essentially have doubled the
amount of indexes.
Options:
a) we live with it
b) we only mark the new index as valid within phase 4. That should be
fine I think?
c) we invent some other state to mark indexes that are in-progress to
replace another one.
I guess b) seems fine?
+ The recommended recovery method in such cases is to drop the concurrent + index and try again to perform <command>REINDEX CONCURRENTLY</>.If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.I think we just shouldn't set ->isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.Sounds good. But, what about other constraint case like unique constraint?
Those other cases also can be resolved by not setting ->isprimary?
Unique indexes can exist without a constraint attached, so thats fine. I
need to read a bit more code whether its safe to unset it, although
indisexclusion, indimmediate might be more important.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 7, 2013 at 2:09 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Mar 6, 2013 at 8:59 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:OK. Patches updated... Please see attached.
I found odd behavior. After I made REINDEX CONCURRENTLY fail twice,
I found that the index which was not marked as INVALID remained
unexpectedly.=# CREATE TABLE hoge (i int primary key);
CREATE TABLE
=# INSERT INTO hoge VALUES (generate_series(1,10));
INSERT 0 10
=# SET statement_timeout TO '1s';
SET
=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID=# REINDEX TABLE CONCURRENTLY hoge;
ERROR: canceling statement due to statement timeout
=# \d hoge
Table "public.hoge"
Column | Type | Modifiers
--------+---------+-----------
i | integer | not null
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)
Invalid indexes cannot be reindexed concurrently and are simply bypassed
during process, so _cct_cct has no reason to exist. For example here is
what I get with a relation having an invalid index:
ioltas=# \d aa
Table "public.aa"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
Indexes:
"aap" btree (a)
"aap_cct" btree (a) INVALID
ioltas=# reindex table concurrently aa;
WARNING: cannot reindex concurrently invalid index "public.aap_cct",
skipping
REINDEX
+ The recommended recovery method in such cases is to drop the concurrent + index and try again to perform <command>REINDEX CONCURRENTLY</>.If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
information should be
documented.
You are right. I'll add a note in the documentation about that. Personally
I find it more instinctive to use DROP CONSTRAINT for a primary key as the
image I have of a concurrent index is a twin of the index it rebuilds.
--
Michael
On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres@2ndquadrant.com>
wrote:Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)Huh, why did that go through? It should have errored out?
I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.
CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
case process is interrupted by user. This has been mentioned in a pas
review but it was missing, so it might have slipped out during a
refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
has been created knowing that hoge_pkey_cct is invalid. I tried with the
latest version of the patch and even the patch attached but couldn't
reproduce it.
+ The recommended recovery method in such cases is to drop the
concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.I think we just shouldn't set ->isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.Sounds good. But, what about other constraint case like unique constraint?
Those other cases also can be resolved by not setting ->isprimary?
We should stick with the concurrent index being a twin of the index it
rebuilds for consistency.
Also, I think that it is important from the session viewpoint to perform a
swap with 2 valid indexes. If the process fails just before swapping
indexes user might want to do that himself and drop the old index, then use
the concurrent one.
Other opinions welcome.
--
Michael
Attachments:
20130306_1_remove_reltoastidxid_v4.patchapplication/octet-stream; name=20130306_1_remove_reltoastidxid_v4.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 81c1be3..e1475e6 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..79af64f 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1337,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1397,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1436,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1464,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1491,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
+ bool found = false;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,37 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ toastidxs[i] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* If index is valid register it, it will be used for next processes */
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ found = true;
+ validtoastidx = toastidxs[i];
+ }
+ i++;
+ }
+
+ /* This should not happen, but check the case of no valid indexes */
+ if (!found)
+ {
+ /* No valid indexes found, so leave with an error */
+ elog(ERROR, "no valid indexes found for toast relation %s",
+ RelationGetRelationName(toastrel));
+ }
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1553,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1567,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1587,9 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1599,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1646,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1661,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1679,17 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1708,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1797,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1816,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1839,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1884,17 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1935,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2032,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0ecfc78..043b279 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 9b33929..0f3b45f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1077,7 +1077,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1256,7 +1255,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..d3e1da4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,61 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1555,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1570,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a55e02..0d6f5c0 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8678,7 +8678,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8686,6 +8685,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8729,7 +8730,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8808,8 +8810,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 8963266..3dd2fda 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -577,8 +577,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -590,7 +590,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e6c85ac..f15e6a2 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130307_2_reindex_concurrently_v20.patchapplication/octet-stream; name=20130307_2_reindex_concurrently_v20.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..fc216d3 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUSION</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,112 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions and to make the new
+ index valid from the other backends. Once this is performed, the old
+ and fresh indexes are swapped in, and the index used during process is
+ marked as invalid in a third transaction. Finally two additional
+ transactions are used to mark the concurrent index as not ready and then
+ drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +395,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0f3b45f..b2895f2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1095,6 +1105,416 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation, Oid indexOid)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ Assert(!indexForm->indislive);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1744,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1825,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1854,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1868,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..c14e4bd 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,27 +681,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,74 +740,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,529 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table and are marked as valid once done.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /*
+ * Concurrent index can now be marked as valid -- update pg_index
+ * entries.
+ */
+ index_set_state_flags(indOid, INDEX_CREATE_SET_VALID);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The old
+ * index is marked as invalid once this is done, making it not usable
+ * by other backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /*
+ * Mark the old index as invalid, this needs to be done before actually
+ * swapping the entries as it needs to be done as the first action of
+ * this transaction.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, ShareUpdateExclusiveLock);
+ heap_close(indexParentRel, ShareUpdateExclusiveLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1960,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1987,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2106,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2185,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2230,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2245,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2337,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0d6f5c0..e11f3f6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..db2a531 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,26 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation, Oid indexOid);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres@2ndquadrant.com>
wrote:Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)Huh, why did that go through? It should have errored out?
I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
case process is interrupted by user. This has been mentioned in a pas
review but it was missing, so it might have slipped out during a
refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
has been created knowing that hoge_pkey_cct is invalid. I tried with the
latest version of the patch and even the patch attached but couldn't
reproduce it.
The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?
But I don't see how it could happen either. Fujii, can you reproduce it?
+ The recommended recovery method in such cases is to drop the
concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
If an invalid index depends on the constraint like primary key, "drop
the concurrent
index" cannot actually drop the index. In this case, you need to issue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.I think we just shouldn't set ->isprimary on the temporary indexes. Now
we switch only the relfilenodes and not the whole index, that should be
perfectly fine.Sounds good. But, what about other constraint case like unique constraint?
Those other cases also can be resolved by not setting ->isprimary?We should stick with the concurrent index being a twin of the index it
rebuilds for consistency.
I don't think its legal. We cannot simply have two indexes with
'indisprimary'. Especially not if bot are valid.
Also, there will be no pg_constraint row that refers to it which
violates very valid expectations that both users and pg may have.
Also, I think that it is important from the session viewpoint to perform a
swap with 2 valid indexes. If the process fails just before swapping
indexes user might want to do that himself and drop the old index, then use
the concurrent one.
The most likely outcome will be to rerun REINDEX CONCURRENTLY. Which
will then reindex one more index since it now has the old valid index
and the new valid index. Also, I don't think its fair game to expose
indexes that used to belong to a constraint without a constraint
supporting it as valid indexes.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao.fujii@gmail.com>
wrote:
On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres@2ndquadrant.com>
wrote:Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)Huh, why did that go through? It should have errored out?
I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
case process is interrupted by user. This has been mentioned in a pas
review but it was missing, so it might have slipped out during a
refactoring or smth. Btw, I am surprised to see that this *_cct_cct index
has been created knowing that hoge_pkey_cct is invalid. I tried with the
latest version of the patch and even the patch attached but couldn't
reproduce it.The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?But I don't see how it could happen either. Fujii, can you reproduce it?
Curious about that also.
+ The recommended recovery method in such cases is to drop the
concurrent
+ index and try again to perform <command>REINDEX
CONCURRENTLY</>.
If an invalid index depends on the constraint like primary key,
"drop
the concurrent
index" cannot actually drop the index. In this case, you need toissue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.I think we just shouldn't set ->isprimary on the temporary indexes.
Now
we switch only the relfilenodes and not the whole index, that should
be
perfectly fine.
Sounds good. But, what about other constraint case like unique
constraint?
Those other cases also can be resolved by not setting ->isprimary?
We should stick with the concurrent index being a twin of the index it
rebuilds for consistency.I don't think its legal. We cannot simply have two indexes with
'indisprimary'. Especially not if bot are valid.
Also, there will be no pg_constraint row that refers to it which
violates very valid expectations that both users and pg may have.
So what to do with that?
Mark the concurrent index as valid, then validate it and finally mark it as
invalid inside the same transaction at phase 4?
That's moving 2 lines of code...
--
Michael
On Thu, Mar 7, 2013 at 9:48 AM, Michael Paquier
<michael.paquier@gmail.com>wrote:
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-07 05:26:31 +0900, Michael Paquier wrote:
On Thu, Mar 7, 2013 at 2:34 AM, Fujii Masao <masao.fujii@gmail.com>
wrote:
On Thu, Mar 7, 2013 at 2:17 AM, Andres Freund <andres@2ndquadrant.com
wrote:
Indexes:
"hoge_pkey" PRIMARY KEY, btree (i)
"hoge_pkey_cct" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct1" PRIMARY KEY, btree (i) INVALID
"hoge_pkey_cct_cct" PRIMARY KEY, btree (i)Huh, why did that go through? It should have errored out?
I'm not sure why. Anyway hoge_pkey_cct_cct should not appear or should
be marked as invalid, I think.CHECK_FOR_INTERRUPTS were not added at each phase and they are needed in
case process is interrupted by user. This has been mentioned in a pas
review but it was missing, so it might have slipped out during a
refactoring or smth. Btw, I am surprised to see that this *_cct_cctindex
has been created knowing that hoge_pkey_cct is invalid. I tried with the
latest version of the patch and even the patch attached but couldn't
reproduce it.The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?But I don't see how it could happen either. Fujii, can you reproduce it?
Curious about that also.
+ The recommended recovery method in such cases is to drop the
concurrent
+ index and try again to perform <command>REINDEX
CONCURRENTLY</>.
If an invalid index depends on the constraint like primary key,
"drop
the concurrent
index" cannot actually drop the index. In this case, you need toissue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.I think we just shouldn't set ->isprimary on the temporary indexes.
Now
we switch only the relfilenodes and not the whole index, that
should be
perfectly fine.
Sounds good. But, what about other constraint case like unique
constraint?
Those other cases also can be resolved by not setting ->isprimary?
We should stick with the concurrent index being a twin of the index it
rebuilds for consistency.I don't think its legal. We cannot simply have two indexes with
'indisprimary'. Especially not if bot are valid.
Also, there will be no pg_constraint row that refers to it which
violates very valid expectations that both users and pg may have.So what to do with that?
Mark the concurrent index as valid, then validate it and finally mark it
as invalid inside the same transaction at phase 4?
That's moving 2 lines of code...
Sorry phase 4 is the swap phase. Validation happens at phase 3.
--
Michael
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres@2ndquadrant.com> wrote:
The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?But I don't see how it could happen either. Fujii, can you reproduce it?
Yes, I can even with the latest version of the patch. The test case to
reproduce it is:
(Session 1)
CREATE TABLE hoge (i int primary key);
INSERT INTO hoge VALUES (generate_series(1,10));
(Session 2)
BEGIN;
SELECT * FROM hoge;
(keep this session as it is)
(Session 1)
SET statement_timeout TO '1s';
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-03-07 09:58:58 +0900, Michael Paquier wrote:
+ The recommended recovery method in such cases is to drop the
concurrent
+ index and try again to perform <command>REINDEX
CONCURRENTLY</>.
If an invalid index depends on the constraint like primary key,
"drop
the concurrent
index" cannot actually drop the index. In this case, you need toissue
"alter table
... drop constraint ..." to recover the situation. I think this
informataion should be
documented.I think we just shouldn't set ->isprimary on the temporary indexes.
Now
we switch only the relfilenodes and not the whole index, that
should be
perfectly fine.
Sounds good. But, what about other constraint case like unique
constraint?
Those other cases also can be resolved by not setting ->isprimary?
We should stick with the concurrent index being a twin of the index it
rebuilds for consistency.I don't think its legal. We cannot simply have two indexes with
'indisprimary'. Especially not if bot are valid.
Also, there will be no pg_constraint row that refers to it which
violates very valid expectations that both users and pg may have.So what to do with that?
Mark the concurrent index as valid, then validate it and finally mark it
as invalid inside the same transaction at phase 4?
That's moving 2 lines of code...Sorry phase 4 is the swap phase. Validation happens at phase 3.
Why do you want to temporarily mark it as valid? I don't see any
requirement that it is set to that during validate_index() (which imo is
badly named, but...).
I'd just set it to valid in the same transaction that does the swap.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 8, 2013 at 1:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres@2ndquadrant.com>
wrote:The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?But I don't see how it could happen either. Fujii, can you reproduce it?
Yes, I can even with the latest version of the patch. The test case to
reproduce it is:(Session 1)
CREATE TABLE hoge (i int primary key);
INSERT INTO hoge VALUES (generate_series(1,10));(Session 2)
BEGIN;
SELECT * FROM hoge;
(keep this session as it is)(Session 1)
SET statement_timeout TO '1s';
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
I fixed this problem in the patch attached. It was caused by 2 things:
- The concurrent index was seen as valid from other backend between phases
3 and 4. So the concurrent index is made valid at phase 4, then swap is
done and finally marked as invalid. So it remains invalid seen from the
other sessions.
- index_set_state_flags used heap_inplace_update, which is not completely
safe at swapping phase, so I had to extend it a bit to use a safe
simple_heap_update at swap phase.
Regards,
--
Michael
Attachments:
20130308_1_remove_reltoastidxid_v4.patchapplication/octet-stream; name=20130308_1_remove_reltoastidxid_v4.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6c0ef5b..8ba390c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..79af64f 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1337,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1397,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1436,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1464,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1491,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
+ bool found = false;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,37 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ toastidxs[i] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* If index is valid register it, it will be used for next processes */
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ found = true;
+ validtoastidx = toastidxs[i];
+ }
+ i++;
+ }
+
+ /* This should not happen, but check the case of no valid indexes */
+ if (!found)
+ {
+ /* No valid indexes found, so leave with an error */
+ elog(ERROR, "no valid indexes found for toast relation %s",
+ RelationGetRelationName(toastrel));
+ }
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1553,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1567,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1587,9 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1599,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1646,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1661,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1679,17 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1708,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1797,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1816,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1839,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1884,17 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1935,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2032,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 04a927d..6384343 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 33a1803..ca0ae5e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1070,7 +1070,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1249,7 +1248,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1756,8 +1754,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1773,8 +1769,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1868,15 +1865,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2064,14 +2052,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..d3e1da4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,61 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1555,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1570,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 47b6233..d3ad79f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8677,7 +8677,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8685,6 +8684,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8728,7 +8729,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8807,8 +8809,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 0e265db..e065e86 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -576,8 +576,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -589,7 +589,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8404458..7076fd6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130308_2_reindex_concurrently_v21.patchapplication/octet-stream; name=20130308_2_reindex_concurrently_v21.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..1f7d046 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUSION</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,111 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +394,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ca0ae5e..2f2f183 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1088,6 +1098,419 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum != 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ Assert(!indexForm->indislive);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1317,7 +1740,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1399,17 +1821,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1437,63 +1850,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1506,13 +1864,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -2983,27 +3335,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3063,8 +3420,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..e4a1db9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,530 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1961,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1988,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2107,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2186,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2231,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2246,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2338,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d3ad79f..a810ef4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..6b1576d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -99,7 +120,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Fri, Mar 8, 2013 at 10:00 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Mar 8, 2013 at 1:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Mar 7, 2013 at 7:19 AM, Andres Freund <andres@2ndquadrant.com>
wrote:The strange think about "hoge_pkey_cct_cct" is that it seems to imply
that an invalid index was reindexed concurrently?But I don't see how it could happen either. Fujii, can you reproduce it?
Yes, I can even with the latest version of the patch. The test case to
reproduce it is:(Session 1)
CREATE TABLE hoge (i int primary key);
INSERT INTO hoge VALUES (generate_series(1,10));(Session 2)
BEGIN;
SELECT * FROM hoge;
(keep this session as it is)(Session 1)
SET statement_timeout TO '1s';
REINDEX TABLE CONCURRENTLY hoge;
\d hoge
REINDEX TABLE CONCURRENTLY hoge;
\d hogeI fixed this problem in the patch attached. It was caused by 2 things:
- The concurrent index was seen as valid from other backend between phases 3
and 4. So the concurrent index is made valid at phase 4, then swap is done
and finally marked as invalid. So it remains invalid seen from the other
sessions.
- index_set_state_flags used heap_inplace_update, which is not completely
safe at swapping phase, so I had to extend it a bit to use a safe
simple_heap_update at swap phase.
Thanks!
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUSION</> constraint need to be dropped with <literal>ALTER TABLE
Typo: s/EXCLUSION/EXCLUDE
I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
The test case to reproduce the segmentation fault is:
1. Install btree_gist
2. Run btree_gist's regression test (i.e., make installcheck)
3. Log in contrib_regression database after the regression test
4. Execute REINDEX TABLE CONCURRENTLY moneytmp
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 9, 2013 at 1:37 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
+ <para> + Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal> + EXCLUSION</> constraint need to be dropped with <literal>ALTER TABLETypo: s/EXCLUSION/EXCLUDE
Thanks. This is corrected.
I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
The test case to reproduce the segmentation fault is:1. Install btree_gist
2. Run btree_gist's regression test (i.e., make installcheck)
3. Log in contrib_regression database after the regression test
4. Execute REINDEX TABLE CONCURRENTLY moneytmp
Oops. I simply forgot to take into account the case of system attributes
when building column names in index_concurrent_create. Fixed in new version
attached.
Regards,
--
Michael
Attachments:
20130309_1_remove_reltoastidxid_v4.patchapplication/octet-stream; name=20130309_1_remove_reltoastidxid_v4.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..6db6851 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -313,9 +313,13 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
" ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6c0ef5b..8ba390c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..79af64f 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1337,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1397,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1436,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1464,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1491,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
+ bool found = false;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,37 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ toastidxs[i] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* If index is valid register it, it will be used for next processes */
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ found = true;
+ validtoastidx = toastidxs[i];
+ }
+ i++;
+ }
+
+ /* This should not happen, but check the case of no valid indexes */
+ if (!found)
+ {
+ /* No valid indexes found, so leave with an error */
+ elog(ERROR, "no valid indexes found for toast relation %s",
+ RelationGetRelationName(toastrel));
+ }
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1553,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1567,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1587,9 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1599,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1646,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1661,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1679,17 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1708,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1797,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1816,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1839,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1884,17 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1935,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2032,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 04a927d..6384343 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 33a1803..ca0ae5e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1070,7 +1070,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1249,7 +1248,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1756,8 +1754,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1773,8 +1769,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1868,15 +1865,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2064,14 +2052,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..d3e1da4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,61 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1555,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1570,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 47b6233..d3ad79f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8677,7 +8677,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8685,6 +8684,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8728,7 +8729,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ RelationGetIndexList(rel);
+ reltoastidxids = list_copy(rel->rd_indexlist);
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8807,8 +8809,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 0e265db..e065e86 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -576,8 +576,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -589,7 +589,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8404458..7076fd6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130309_2_reindex_concurrently_v22.patchapplication/octet-stream; name=20130309_2_reindex_concurrently_v22.patchDownload
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..5ba057c 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,111 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +394,17 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ca0ae5e..e265619 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1088,6 +1098,426 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ Assert(!indexForm->indislive);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1317,7 +1747,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1399,17 +1828,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1437,63 +1857,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1506,13 +1871,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -2983,27 +3342,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3063,8 +3427,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..e4a1db9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,530 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1961,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1988,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2107,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2186,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2231,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2246,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2338,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d3ad79f..a810ef4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..6b1576d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -99,7 +120,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Sat, Mar 9, 2013 at 1:31 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Sat, Mar 9, 2013 at 1:37 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
+ <para> + Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal> + EXCLUSION</> constraint need to be dropped with <literal>ALTER TABLETypo: s/EXCLUSION/EXCLUDE
Thanks. This is corrected.
I encountered a segmentation fault when I ran REINDEX CONCURRENTLY.
The test case to reproduce the segmentation fault is:1. Install btree_gist
2. Run btree_gist's regression test (i.e., make installcheck)
3. Log in contrib_regression database after the regression test
4. Execute REINDEX TABLE CONCURRENTLY moneytmpOops. I simply forgot to take into account the case of system attributes
when building column names in index_concurrent_create. Fixed in new version
attached.
Thanks for updating the patch!
I found the problem that the patch changed the behavior of
ALTER TABLE SET TABLESPACE so that it moves also
the index on the specified table to new tablespace. Per the
document of ALTER TABLE, this is not right behavior.
I think that it's worth adding new option for concurrent rebuilding
into reindexdb command. It's better to implement this separately
from core patch, though.
You need to add the description of locking of REINDEX CONCURRENTLY
into mvcc.sgml, I think.
+ Rebuild a table concurrently:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
Obviously REINDEX cannot rebuild a table ;)
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 8, 2013 at 1:46 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Why do you want to temporarily mark it as valid? I don't see any
requirement that it is set to that during validate_index() (which imo is
badly named, but...).
I'd just set it to valid in the same transaction that does the swap.
+1. I cannot realize yet why isprimary flag needs to be set even
in the invalid index. In current patch, we can easily get into the
inconsistent situation, i.e., a table having more than one primary
key indexes.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Thanks for updating the patch!
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " JOIN pg_catalog.pg_index p "
+ " ON i.reloid = p.indrelid "
+ "WHERE p.indexrelid >= %u ", FirstNormalObjectId));
This new SQL doesn't seem to be right. Old one doesn't pick up any indexes
other than toast index, but new one seems to do.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 10, 2013 at 4:50 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao <masao.fujii@gmail.com>
wrote:Thanks for updating the patch!
- "SELECT reltoastidxid " - "FROM info_rels i JOIN pg_catalog.pg_class c " - " ON i.reloid = c.oid")); + "SELECT indexrelid " + "FROM info_rels i " + " JOIN pg_catalog.pg_class c " + " ON i.reloid = c.oid " + " JOIN pg_catalog.pg_index p " + " ON i.reloid = p.indrelid " + "WHERE p.indexrelid >= %u ", FirstNormalObjectId));This new SQL doesn't seem to be right. Old one doesn't pick up any indexes
other than toast index, but new one seems to do.
Indeed, it was selecting all indexes...
I replaced it by this query reducing the selection of indexes for toast
relations:
- "SELECT
reltoastidxid "
- "FROM info_rels i
JOIN pg_catalog.pg_class c "
- " ON
i.reloid = c.oid"));
+ "SELECT
indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid
IN (SELECT reltoastrelid "
+ " FROM
pg_class "
+ " WHERE
oid >= %u "
+ " AND
reltoastrelid != %u)",
+
FirstNormalObjectId, InvalidOid));
Will send patch soon...
--
Michael
Please find attached updated version. I also corrected the problem of the
query in pg_upgrade when fetching Oids of indexes of toast relation.
On Sun, Mar 10, 2013 at 3:48 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
I found the problem that the patch changed the behavior of
ALTER TABLE SET TABLESPACE so that it moves also
the index on the specified table to new tablespace. Per the
document of ALTER TABLE, this is not right behavior.
Oops. Fixed in the patch attached. The bug was in the toastrelidxid patch,
not REINDEX CONCURRENTLY core.
I think that it's worth adding new option for concurrent rebuilding
into reindexdb command. It's better to implement this separately
from core patch, though.
Yeah, agreed. It is not that much complicated. And this should be done
after this patch is finished.
You need to add the description of locking of REINDEX CONCURRENTLY
into mvcc.sgml, I think.
OK, I added some reference to that in the docs. I also added a paragraph
about the lock used during process.
+ Rebuild a table concurrently:
+ +<programlisting> +REINDEX TABLE CONCURRENTLY my_broken_table;
OK... OK... Documentation should be polished more... I changed this
paragraph a bit to mention that read and write operations can be performed
on the table in this case.
--
Michael
Attachments:
20130310_1_remove_reltoastidxid_v5.patchapplication/octet-stream; name=20130310_1_remove_reltoastidxid_v5.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..763c703 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -310,12 +310,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid IN (SELECT reltoastrelid "
+ " FROM pg_class "
+ " WHERE oid >= %u "
+ " AND reltoastrelid != %u)",
+ FirstNormalObjectId, InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6c0ef5b..8ba390c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..79af64f 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1238,7 +1238,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1257,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1327,10 +1337,13 @@ toast_save_datum(Relation rel, Datum value,
*/
if (!OidIsValid(rel->rd_toastoid))
{
- /* normal case: just choose an unused OID */
+ /*
+ * normal case: just choose an unused OID. Simply use the first
+ * index relation.
+ */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
}
else
@@ -1384,7 +1397,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[0]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1436,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1464,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1491,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
+ bool found = false;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1508,37 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ toastidxs[i] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* If index is valid register it, it will be used for next processes */
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ found = true;
+ validtoastidx = toastidxs[i];
+ }
+ i++;
+ }
+
+ /* This should not happen, but check the case of no valid indexes */
+ if (!found)
+ {
+ /* No valid indexes found, so leave with an error */
+ elog(ERROR, "no valid indexes found for toast relation %s",
+ RelationGetRelationName(toastrel));
+ }
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1553,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1567,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1537,6 +1587,9 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ScanKeyData toastkey;
SysScanDesc toastscan;
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexList(toastrel);
+
/*
* Setup a scan key to find chunks with matching va_valueid
*/
@@ -1546,9 +1599,10 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
ObjectIdGetDatum(valueid));
/*
- * Is there any such chunk?
+ * Is there any such chunk? Use the first index available for scan
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ linitial_oid(toastrel->rd_indexlist),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1592,7 +1646,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1661,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1679,17 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1708,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1797,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1751,7 +1816,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1839,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1884,17 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexList(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1935,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[0],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2032,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 04a927d..6384343 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 33a1803..ca0ae5e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1070,7 +1070,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1249,7 +1248,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1756,8 +1754,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1773,8 +1769,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1868,15 +1865,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2064,14 +2052,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..d3e1da4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,61 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1555,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1570,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 47b6233..04393d4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8677,7 +8677,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8685,6 +8684,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8728,7 +8729,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ RelationGetIndexList(toastRel);
+ reltoastidxids = list_copy(toastRel->rd_indexlist);
+ relation_close(toastRel, NoLock);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8807,8 +8815,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 0e265db..e065e86 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -576,8 +576,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -589,7 +589,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8404458..7076fd6 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2669,10 +2669,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2681,7 +2680,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2706,11 +2704,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
-
- /* every toast table has an index */
- appendPQExpBuffer(upgrade_buffer,
- "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
}
}
else
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130310_2_reindex_concurrently_v23.patchapplication/octet-stream; name=20130310_2_reindex_concurrently_v23.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index db820d6..e77b058 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..a8b5fc9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,119 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +402,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ca0ae5e..e265619 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1088,6 +1098,426 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char*)concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /* We have to re-build the IndexInfo struct, since it was lost in commit */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Replace old index by old index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG *locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ if (locktag)
+ WaitForVirtualLocks(*locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+ Assert(!indexForm->indislive);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1317,7 +1747,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1399,17 +1828,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1437,63 +1857,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, &heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1506,13 +1871,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -2983,27 +3342,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3063,8 +3427,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..e4a1db9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -853,7 +755,7 @@ DefineIndex(IndexStmt *stmt,
* relcache inval on the parent table to force replanning of cached plans.
* Otherwise existing sessions might fail to use the new index where it
* would be useful. (Note that our earlier commits did not create reasons
- * to replan; so relcache flush on the index itself was sufficient.)
+ * to replan; relcache flush on the index itself was sufficient.)
*/
CacheInvalidateRelcacheByRelid(heaprelid.relId);
@@ -873,6 +775,530 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Finish the index invalidation and set it as dead. It is not
+ * necessary to wait for virtual locks on the parent relation as it
+ * is already sure that this session holds sufficient locks.
+ */
+ index_concurrent_set_dead(indOid, relOid, NULL);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1961,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +1988,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2107,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2186,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2231,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2246,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2338,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 04393d4..61862e8 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -904,6 +904,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 867b0c0..b93d90c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0787d2f..f087219 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6806,29 +6806,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..6b1576d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG *locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -99,7 +120,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
I have been working on improving the code of the 2 patches:
1) reltoastidxid removal:
- Improvement of mechanism in tuptoaster.c to fetch the first valid index
for toast value deletion and fetch
- Added a macro called RelationGetIndexListIfValid that avoids recompiling
the index list with list_copy as RelationGetIndexList does. Not using a
macro resulted in increased shared memory usage when multiple toast values
were added inside the same query (stuff like "insert into tab values
(generate_series(1,1000), '2k_long_text')")
- Fix a bug with pg_dump and binary upgrade. One valid index is necessary
for a given toast relation.
2) reindex concurrently:
- correction of some comments
- fix for index_concurrent_set_dead where process did not wait that other
backends released lock on parent relation
- addition of a error message in index_concurrent_drop if it is tried to
drop a live index. Dropping a live index with only ShareUpdate lock is
dangerous
I am also planning to test the potential performance impact of the patch
removing reltoastidxid with scripts of the type attached. I don't really
know if it can be quantified but I'll give a try with some methods (not yet
completely defined).
--
Michael
Attachments:
20130313_1_remove_reltoastidxid_v6.patchapplication/octet-stream; name=20130313_1_remove_reltoastidxid_v6.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..763c703 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -310,12 +310,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid IN (SELECT reltoastrelid "
+ " FROM pg_class "
+ " WHERE oid >= %u "
+ " AND reltoastrelid != %u)",
+ FirstNormalObjectId, InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6c0ef5b..8ba390c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..e1af68d 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,13 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
@@ -1237,8 +1239,8 @@ static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1259,29 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1346,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
@@ -1367,7 +1383,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,7 +1401,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1440,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1468,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1495,14 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1511,22 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1541,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1555,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1531,11 +1569,28 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,7 +1603,8 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1556,6 +1612,11 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
@@ -1573,7 +1634,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
@@ -1591,8 +1652,8 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1668,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1686,21 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1719,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1808,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1750,8 +1826,8 @@ toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1850,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1895,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1947,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2044,36 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+/* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+static Relation
+toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+{
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 04a927d..6384343 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -767,7 +767,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 33a1803..ca0ae5e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1070,7 +1070,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1249,7 +1248,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1756,8 +1754,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1773,8 +1769,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1868,15 +1865,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2064,14 +2052,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8ab8c17..d3e1da4 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1169,8 +1169,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1379,18 +1377,61 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1514,12 +1555,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1528,11 +1570,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8bb8f54..f852aad 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8667,7 +8667,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8675,6 +8674,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8718,7 +8719,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ RelationGetIndexList(toastRel);
+ reltoastidxids = list_copy(toastRel->rd_indexlist);
+ relation_close(toastRel, NoLock);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8797,8 +8805,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 0e265db..e065e86 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -576,8 +576,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -589,7 +589,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 8404458..b2bee9d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2672,16 +2672,17 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2707,7 +2708,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..31309ed 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfValid
+ * Get index list of relation without recomputing it.
+ */
+#define RelationGetIndexListIfValid(rel) \
+do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+} while(0)
+
+/*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130313_2_reindex_concurrently_v24.patchapplication/octet-stream; name=20130313_2_reindex_concurrently_v24.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index db820d6..e77b058 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..a8b5fc9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,119 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +402,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ca0ae5e..3950b8b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1088,6 +1098,438 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * Note: a lower lock could be taken if catalog cache with SnapshotNow was
+ * correctly MVCC'd
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1317,7 +1759,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1399,17 +1840,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1437,63 +1869,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1506,13 +1883,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -2983,27 +3354,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3063,8 +3439,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..2ea997f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -873,6 +775,544 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1975,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +2002,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2121,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2200,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2245,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2260,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2352,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f852aad..30e7b8e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -899,6 +899,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fd3823a..27408b4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3618,6 +3618,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9d07f30..de4bbea 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6784,29 +6784,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index fb323f7..5cffe2d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -99,7 +120,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I have been working on improving the code of the 2 patches:
I found pg_dump dumps even the invalid index. But pg_dump should
ignore the invalid index?
This problem exists even without REINDEX CONCURRENTLY patch. So we might need to
implement the bugfix patch separately rather than including the bugfix
code in your patches.
Probably the backport would be required. Thought?
We should add the concurrent reindex option into reindexdb command?
This can be really
separate patch, though.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013/03/17, at 0:35, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:I have been working on improving the code of the 2 patches:
I found pg_dump dumps even the invalid index. But pg_dump should
ignore the invalid index?
This problem exists even without REINDEX CONCURRENTLY patch. So we might need to
implement the bugfix patch separately rather than including the bugfix
code in your patches.
Probably the backport would be required. Thought?
Hum... Indeed, they shouldn't be included... Perhaps this is already known?
We should add the concurrent reindex option into reindexdb command?
This can be really
separate patch, though.
Yes, they definitely should be separated for simplicity.
Btw, those patches seem trivial, I'll send them.
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Please find attached the patches wanted:
- 20130317_reindexdb_concurrently.patch, adding an option -c/--concurrently
to reindexdb
Note that I added an error inside reindexdb for options "-s -c" as REINDEX
CONCURRENTLY does not support SYSTEM.
- 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
not take a dump of invalid indexes. This patch can be backpatched to 9.0.
On Sun, Mar 17, 2013 at 3:31 AM, Michael Paquier
<michael.paquier@gmail.com>wrote:
On 2013/03/17, at 0:35, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
I found pg_dump dumps even the invalid index. But pg_dump should
ignore the invalid index?
This problem exists even without REINDEX CONCURRENTLY patch. So we mightneed to
implement the bugfix patch separately rather than including the bugfix
code in your patches.
Probably the backport would be required. Thought?Hum... Indeed, they shouldn't be included... Perhaps this is already known?
Note that there have been some recent discussions about that. This
*problem* also concerned pg_upgrade.
/messages/by-id/20121207141236.GB4699@alvh.no-ip.org
--
Michael
Attachments:
20130317_reindexdb_concurrently.patchapplication/octet-stream; name=20130317_reindexdb_concurrently.patchDownload
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 342e4c9..a36a016 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -18,12 +18,12 @@ static void reindex_one_database(const char *name, const char *dbname,
const char *type, const char *host,
const char *port, const char *username,
enum trivalue prompt_password, const char *progname,
- bool echo);
+ bool echo, bool concurrently);
static void reindex_all_databases(const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
const char *progname, bool echo,
- bool quiet);
+ bool quiet, bool concurrently);
static void reindex_system_catalogs(const char *dbname,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
@@ -46,6 +46,7 @@ main(int argc, char *argv[])
{"system", no_argument, NULL, 's'},
{"table", required_argument, NULL, 't'},
{"index", required_argument, NULL, 'i'},
+ {"concurrently", no_argument, NULL, 'c'},
{"maintenance-db", required_argument, NULL, 2},
{NULL, 0, NULL, 0}
};
@@ -64,6 +65,7 @@ main(int argc, char *argv[])
bool alldb = false;
bool echo = false;
bool quiet = false;
+ bool concurrently = false;
SimpleStringList indexes = {NULL, NULL};
SimpleStringList tables = {NULL, NULL};
@@ -73,7 +75,7 @@ main(int argc, char *argv[])
handle_help_version_opts(argc, argv, "reindexdb", help);
/* process command-line options */
- while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:ast:i:", long_options, &optindex)) != -1)
+ while ((c = getopt_long(argc, argv, "h:p:U:wWeqd:ast:i:c", long_options, &optindex)) != -1)
{
switch (c)
{
@@ -113,6 +115,9 @@ main(int argc, char *argv[])
case 'i':
simple_string_list_append(&indexes, optarg);
break;
+ case 'c':
+ concurrently = true;
+ break;
case 2:
maintenance_db = pg_strdup(optarg);
break;
@@ -166,7 +171,8 @@ main(int argc, char *argv[])
}
reindex_all_databases(maintenance_db, host, port, username,
- prompt_password, progname, echo, quiet);
+ prompt_password, progname, echo, quiet,
+ concurrently);
}
else if (syscatalog)
{
@@ -180,6 +186,11 @@ main(int argc, char *argv[])
fprintf(stderr, _("%s: cannot reindex specific index(es) and system catalogs at the same time\n"), progname);
exit(1);
}
+ if (concurrently)
+ {
+ fprintf(stderr, _("%s: cannot reindex concurrently system catalogs\n"), progname);
+ exit(1);
+ }
if (dbname == NULL)
{
@@ -213,7 +224,8 @@ main(int argc, char *argv[])
for (cell = indexes.head; cell; cell = cell->next)
{
reindex_one_database(cell->val, dbname, "INDEX", host, port,
- username, prompt_password, progname, echo);
+ username, prompt_password, progname, echo,
+ concurrently);
}
}
if (tables.head != NULL)
@@ -223,13 +235,15 @@ main(int argc, char *argv[])
for (cell = tables.head; cell; cell = cell->next)
{
reindex_one_database(cell->val, dbname, "TABLE", host, port,
- username, prompt_password, progname, echo);
+ username, prompt_password, progname, echo,
+ concurrently);
}
}
/* reindex database only if neither index nor table is specified */
if (indexes.head == NULL && tables.head == NULL)
reindex_one_database(dbname, dbname, "DATABASE", host, port,
- username, prompt_password, progname, echo);
+ username, prompt_password, progname, echo,
+ concurrently);
}
exit(0);
@@ -238,7 +252,8 @@ main(int argc, char *argv[])
static void
reindex_one_database(const char *name, const char *dbname, const char *type,
const char *host, const char *port, const char *username,
- enum trivalue prompt_password, const char *progname, bool echo)
+ enum trivalue prompt_password, const char *progname, bool echo,
+ bool concurrently)
{
PQExpBufferData sql;
@@ -246,13 +261,12 @@ reindex_one_database(const char *name, const char *dbname, const char *type,
initPQExpBuffer(&sql);
- appendPQExpBuffer(&sql, "REINDEX");
- if (strcmp(type, "TABLE") == 0)
- appendPQExpBuffer(&sql, " TABLE %s", name);
- else if (strcmp(type, "INDEX") == 0)
- appendPQExpBuffer(&sql, " INDEX %s", name);
- else if (strcmp(type, "DATABASE") == 0)
- appendPQExpBuffer(&sql, " DATABASE %s", fmtId(name));
+ appendPQExpBuffer(&sql, "REINDEX %s", type);
+ if (concurrently)
+ appendPQExpBuffer(&sql, " CONCURRENTLY");
+
+ appendPQExpBuffer(&sql, " %s",
+ strcmp(type, "DATABASE") == 0 ? fmtId(name) : name);
appendPQExpBuffer(&sql, ";\n");
conn = connectDatabase(dbname, host, port, username, prompt_password,
@@ -281,7 +295,8 @@ static void
reindex_all_databases(const char *maintenance_db,
const char *host, const char *port,
const char *username, enum trivalue prompt_password,
- const char *progname, bool echo, bool quiet)
+ const char *progname, bool echo, bool quiet,
+ bool concurrently)
{
PGconn *conn;
PGresult *result;
@@ -303,7 +318,7 @@ reindex_all_databases(const char *maintenance_db,
}
reindex_one_database(dbname, dbname, "DATABASE", host, port, username,
- prompt_password, progname, echo);
+ prompt_password, progname, echo, concurrently);
}
PQclear(result);
@@ -343,6 +358,7 @@ help(const char *progname)
printf(_(" %s [OPTION]... [DBNAME]\n"), progname);
printf(_("\nOptions:\n"));
printf(_(" -a, --all reindex all databases\n"));
+ printf(_(" -c, --concurrently reindex concurrently\n"));
printf(_(" -d, --dbname=DBNAME database to reindex\n"));
printf(_(" -e, --echo show the commands being sent to the server\n"));
printf(_(" -i, --index=INDEX recreate specific index(es) only\n"));
20130317_dump_only_valid_index.patchapplication/octet-stream; name=20130317_dump_only_valid_index.patchDownload
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 093be9e..0559e98 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -4804,6 +4804,7 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
"i.indexrelid = c.conindid AND "
"c.contype IN ('p','u','x')) "
"WHERE i.indrelid = '%u'::pg_catalog.oid "
+ "AND i.indisvalid "
"ORDER BY indexname",
tbinfo->dobj.catId.oid);
}
On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Please find attached the patches wanted:
- 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
not take a dump of invalid indexes. This patch can be backpatched to 9.0.
Don't indisready and indislive need to be checked?
The patch seems to change pg_dump so that it ignores an invalid index only
when the remote server version >= 9.0. But why not when the remote server
version < 9.0?
I think that you should start new thread to get much attention about this patch
if there is no enough feedback.
Note that there have been some recent discussions about that. This *problem*
also concerned pg_upgrade.
/messages/by-id/20121207141236.GB4699@alvh.no-ip.org
What's the conclusion of this discussion? pg_dump --binary-upgrade also should
ignore an invalid index? pg_upgrade needs to be changed together?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I have been working on improving the code of the 2 patches:
1) reltoastidxid removal:
<snip>
- Fix a bug with pg_dump and binary upgrade. One valid index is necessary
for a given toast relation.
Is this bugfix related to the following?
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",
Don't indisready and indislive need to be checked?
Why is LIMIT 1 required? The toast table can have more than one toast indexes?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Mar 19, 2013 at 3:03 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Please find attached the patches wanted:
- 20130317_dump_only_valid_index.patch, a 1-line patch that makes pg_dump
not take a dump of invalid indexes. This patch can be backpatched to 9.0.Don't indisready and indislive need to be checked?
The patch seems to change pg_dump so that it ignores an invalid index only
when the remote server version >= 9.0. But why not when the remote server
version < 9.0?I think that you should start new thread to get much attention about this
patch
if there is no enough feedback.
Yeah... Will send a message about that...
Note that there have been some recent discussions about that. This
*problem*
also concerned pg_upgrade.
/messages/by-id/20121207141236.GB4699@alvh.no-ip.org
What's the conclusion of this discussion? pg_dump --binary-upgrade also
should
ignore an invalid index? pg_upgrade needs to be changed together?
The conclusion is that pg_dump should not need to include invalid indexes
if it is
to create them as valid index during restore. However I haven't seen any
patch...
--
Michael
On Tue, Mar 19, 2013 at 3:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:I have been working on improving the code of the 2 patches:
1) reltoastidxid removal:<snip>
- Fix a bug with pg_dump and binary upgrade. One valid index is necessary
for a given toast relation.Is this bugfix related to the following?
appendPQExpBuffer(upgrade_query, - "SELECT c.reltoastrelid, t.reltoastidxid " + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid " + "LIMIT 1",
Yes.
Don't indisready and indislive need to be checked?
An index is valid if it is already ready and line. We could add such check
for safely but I don't think it is necessary.
Why is LIMIT 1 required? The toast table can have more than one toast
indexes?
It cannot have more than one VALID index, so yes as long as a check on
indisvalid is here there is no need to worry about a LIMIT condition. I
only thought of that as a safeguard. The same thing applies to the addition
of a condition based on indislive and indisready.
--
Michael
On Tue, Mar 19, 2013 at 8:54 AM, Michael Paquier
<michael.paquier@gmail.com>wrote:
On Tue, Mar 19, 2013 at 3:03 AM, Fujii Masao <masao.fujii@gmail.com>wrote:
On Sun, Mar 17, 2013 at 9:24 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Please find attached the patches wanted:
- 20130317_dump_only_valid_index.patch, a 1-line patch that makespg_dump
not take a dump of invalid indexes. This patch can be backpatched to
9.0.
Don't indisready and indislive need to be checked?
The patch seems to change pg_dump so that it ignores an invalid index only
when the remote server version >= 9.0. But why not when the remote server
version < 9.0?I think that you should start new thread to get much attention about this
patch
if there is no enough feedback.Yeah... Will send a message about that...
Note that there have been some recent discussions about that. This
*problem*
also concerned pg_upgrade.
/messages/by-id/20121207141236.GB4699@alvh.no-ip.org
What's the conclusion of this discussion? pg_dump --binary-upgrade also
should
ignore an invalid index? pg_upgrade needs to be changed together?The conclusion is that pg_dump should not need to include invalid indexes
if it is
to create them as valid index during restore. However I haven't seen any
patch...
The fix has been done inside pg_upgrade:
http://momjian.us/main/blogs/pgblog/2012.html#December_14_2012
Nothing has been done for pg_dump.
--
Michael
Is someone planning to provide additional feedback about this patch at some
point?
Thanks,
--
Michael
Hi,
Please find new patches realigned with HEAD. There were conflicts with
commits done recently.
Thanks,
--
Michael
Attachments:
20130323_1_toastindex_v7.patchapplication/octet-stream; name=20130323_1_toastindex_v7.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..763c703 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -310,12 +310,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid IN (SELECT reltoastrelid "
+ " FROM pg_class "
+ " WHERE oid >= %u "
+ " AND reltoastrelid != %u)",
+ FirstNormalObjectId, InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6c0ef5b..8ba390c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index db820d6..e77b058 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..a8b5fc9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,119 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +402,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..e1af68d 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,13 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
@@ -1237,8 +1239,8 @@ static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1259,29 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1346,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
@@ -1367,7 +1383,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,7 +1401,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1440,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1468,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1495,14 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1511,22 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1541,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1555,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1531,11 +1569,28 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,7 +1603,8 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1556,6 +1612,11 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
@@ -1573,7 +1634,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
@@ -1591,8 +1652,8 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1668,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1686,21 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1719,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1808,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1750,8 +1826,8 @@ toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1850,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1895,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1947,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2044,36 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+/* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+static Relation
+toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+{
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0b4c659..8114d77 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -768,7 +768,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7966558..73686f6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -103,7 +105,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1071,7 +1081,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1090,6 +1099,438 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * Note: a lower lock could be taken if catalog cache with SnapshotNow was
+ * correctly MVCC'd
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1253,7 +1694,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1326,7 +1766,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1408,17 +1847,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1446,63 +1876,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1515,13 +1890,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -1763,8 +2132,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +2147,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +2243,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2430,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
@@ -3005,27 +3361,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3085,8 +3446,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index ef9c5f1..5ef164b 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1176,8 +1176,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1396,19 +1394,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- is_internal,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ is_internal,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1533,12 +1574,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1547,11 +1589,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName, true);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName, true);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..2ea997f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -873,6 +775,544 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1975,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +2002,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2121,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2200,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2245,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2260,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2352,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 536d232..d83e0b6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -899,6 +899,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
@@ -8726,7 +8758,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8734,6 +8765,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8780,7 +8813,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ RelationGetIndexList(toastRel);
+ reltoastidxids = list_copy(toastRel->rd_indexlist);
+ relation_close(toastRel, NoLock);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8861,8 +8901,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fd3823a..27408b4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3618,6 +3618,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0d82141..2d91451 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6761,29 +6761,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index cb59f13..388685a 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 093be9e..843536f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2677,16 +2677,17 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2712,7 +2713,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..0693e3d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -100,7 +121,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..31309ed 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfValid
+ * Get index list of relation without recomputing it.
+ */
+#define RelationGetIndexListIfValid(rel) \
+do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+} while(0)
+
+/*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130323_2_reindex_concurrently_v25.patchapplication/octet-stream; name=20130323_2_reindex_concurrently_v25.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index db820d6..e77b058 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..a8b5fc9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,119 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +402,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 210ceda..73686f6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1089,6 +1099,438 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * Note: a lower lock could be taken if catalog cache with SnapshotNow was
+ * correctly MVCC'd
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1766,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1847,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1876,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1890,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -2990,27 +3361,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3070,8 +3446,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..2ea997f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -873,6 +775,544 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1975,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +2002,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2121,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2200,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2245,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2260,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2352,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 64d669b..d83e0b6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -899,6 +899,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fd3823a..27408b4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3618,6 +3618,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0d82141..2d91451 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6761,29 +6761,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..0693e3d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -100,7 +121,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
Is someone planning to provide additional feedback about this patch at some
point?
Yes, now that I have returned from my holidays - or well, am returning
from them, I do plan to. But it should probably get some implementation
level review from somebody but Fujii and me...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
Is someone planning to provide additional feedback about this patch at
some
point?
Yes, now that I have returned from my holidays - or well, am returning
from them, I do plan to. But it should probably get some implementation
level review from somebody but Fujii and me...
Yeah, it would be good to have an extra pair of fresh eyes looking at those
patches.
Thanks,
--
Michael
On Sun, Mar 24, 2013 at 12:37 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Sat, Mar 23, 2013 at 10:20 PM, Andres Freund <andres@2ndquadrant.com>
wrote:On 2013-03-22 07:38:36 +0900, Michael Paquier wrote:
Is someone planning to provide additional feedback about this patch at
some
point?Yes, now that I have returned from my holidays - or well, am returning
from them, I do plan to. But it should probably get some implementation
level review from somebody but Fujii and me...Yeah, it would be good to have an extra pair of fresh eyes looking at those
patches.
Probably I don't have enough time to review the patch thoroughly. It's quite
helpful if someone becomes another reviewer of this patch.
Please find new patches realigned with HEAD. There were conflicts with commits done recently.
ISTM you failed to make the patches from your repository.
20130323_1_toastindex_v7.patch contains all the changes of
20130323_2_reindex_concurrently_v25.patch
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Mar 27, 2013 at 3:05 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
ISTM you failed to make the patches from your repository.
20130323_1_toastindex_v7.patch contains all the changes of
20130323_2_reindex_concurrently_v25.patch
Oops, sorry I haven't noticed.
Please find correct versions attached (realigned with latest head at the
same time).
--
Michael
Attachments:
20130327_1_toastindex_v7.patchapplication/octet-stream; name=20130327_1_toastindex_v7.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..763c703 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -310,12 +310,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid IN (SELECT reltoastrelid "
+ " FROM pg_class "
+ " WHERE oid >= %u "
+ " AND reltoastrelid != %u)",
+ FirstNormalObjectId, InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6c0ef5b..8ba390c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..e1af68d 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,13 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
@@ -1237,8 +1239,8 @@ static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1259,29 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1346,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
@@ -1367,7 +1383,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,7 +1401,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1440,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1468,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1495,14 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1511,22 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1541,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1555,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1531,11 +1569,28 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,7 +1603,8 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1556,6 +1612,11 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
@@ -1573,7 +1634,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
@@ -1591,8 +1652,8 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1668,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1686,21 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1719,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1808,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1750,8 +1826,8 @@ toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1850,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1895,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1947,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2044,36 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+/* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+static Relation
+toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+{
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0b4c659..8114d77 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -768,7 +768,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7966558..210ceda 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1071,7 +1071,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1253,7 +1252,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index ef9c5f1..5ef164b 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1176,8 +1176,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1396,19 +1394,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- is_internal,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ is_internal,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1533,12 +1574,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1547,11 +1589,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName, true);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName, true);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 536d232..64d669b 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8726,7 +8726,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8734,6 +8733,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8780,7 +8781,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ RelationGetIndexList(toastRel);
+ reltoastidxids = list_copy(toastRel->rd_indexlist);
+ relation_close(toastRel, NoLock);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8861,8 +8869,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index cb59f13..388685a 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index e4cf92a..1ab7e85 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2778,16 +2778,17 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2813,7 +2814,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..31309ed 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfValid
+ * Get index list of relation without recomputing it.
+ */
+#define RelationGetIndexListIfValid(rel) \
+do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+} while(0)
+
+/*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130327_2_reindex_concurrently_v25.patchapplication/octet-stream; name=20130327_2_reindex_concurrently_v25.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index db820d6..e77b058 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..a8b5fc9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,119 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +402,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 210ceda..73686f6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1089,6 +1099,438 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * Note: a lower lock could be taken if catalog cache with SnapshotNow was
+ * correctly MVCC'd
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1766,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1847,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1876,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1890,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -2990,27 +3361,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3070,8 +3446,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..2ea997f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -873,6 +775,544 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1975,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +2002,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2121,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2200,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2245,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2260,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2352,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 64d669b..d83e0b6 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -899,6 +899,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fd3823a..27408b4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3618,6 +3618,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0d82141..2d91451 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6761,29 +6761,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..0693e3d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -100,7 +121,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Wed, Mar 27, 2013 at 8:26 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Mar 27, 2013 at 3:05 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
ISTM you failed to make the patches from your repository.
20130323_1_toastindex_v7.patch contains all the changes of
20130323_2_reindex_concurrently_v25.patchOops, sorry I haven't noticed.
Please find correct versions attached (realigned with latest head at the
same time).
Thanks!
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ RelationGetIndexList(toastRel);
+ reltoastidxids = list_copy(toastRel->rd_indexlist);
+ relation_close(toastRel, NoLock);
list_copy() seems not to be required here. We can just set reltoastidxids to
the return list of RelationGetIndexList().
Since we call relation_open() with lockmode, ISTM that we should also call
relation_close() with the same lockmode instead of NoLock. No?
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid idxid = lfirst_oid(lc);
+ if (OidIsValid(idxid))
+ ATExecSetTableSpace(idxid, newTableSpace, lockmode);
Since idxid is the pg_index.indexrelid, ISTM it should never be invalid.
If this is true, the check of OidIsValid(idxid) is not required.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for the comments. Please find updated patches attached.
On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
- reltoastidxid = rel->rd_rel->reltoastidxid; + /* Fetch the list of indexes on toast relation if necessary */ + if (OidIsValid(reltoastrelid)) + { + Relation toastRel = relation_open(reltoastrelid, lockmode); + RelationGetIndexList(toastRel); + reltoastidxids = list_copy(toastRel->rd_indexlist); + relation_close(toastRel, NoLock);list_copy() seems not to be required here. We can just set reltoastidxids
to
the return list of RelationGetIndexList().
Good catch. I thought that I took care of such things in previous versions
at
all the places.
Since we call relation_open() with lockmode, ISTM that we should also call
relation_close() with the same lockmode instead of NoLock. No?
Agreed on that.
- if (OidIsValid(reltoastidxid)) - ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode); + foreach(lc, reltoastidxids) + { + Oid idxid = lfirst_oid(lc); + if (OidIsValid(idxid)) + ATExecSetTableSpace(idxid, newTableSpace, lockmode);Since idxid is the pg_index.indexrelid, ISTM it should never be invalid.
If this is true, the check of OidIsValid(idxid) is not required.
Indeed...
--
Michael
Attachments:
20130328_1_toastindex_v8.patchapplication/octet-stream; name=20130328_1_toastindex_v8.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index a5aa40f..763c703 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -310,12 +310,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid IN (SELECT reltoastrelid "
+ " FROM pg_class "
+ " WHERE oid >= %u "
+ " AND reltoastrelid != %u)",
+ FirstNormalObjectId, InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 6c0ef5b..8ba390c 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..e12d1c1 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..e1af68d 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,13 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
@@ -1237,8 +1239,8 @@ static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1259,29 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1346,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
@@ -1367,7 +1383,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,7 +1401,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1440,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1449,8 +1468,10 @@ toast_save_datum(Relation rel, Datum value,
/*
* Done - close toast relation
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1495,14 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1511,22 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1541,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1555,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1531,11 +1569,28 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,7 +1603,8 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1556,6 +1612,11 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
@@ -1573,7 +1634,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
@@ -1591,8 +1652,8 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1668,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1686,21 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1719,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1808,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1750,8 +1826,8 @@ toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1850,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1895,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1947,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2044,36 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+/* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+static Relation
+toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+{
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 0b4c659..8114d77 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -768,7 +768,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7966558..210ceda 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1071,7 +1071,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1253,7 +1252,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1763,8 +1761,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1780,8 +1776,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1875,15 +1872,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2071,14 +2059,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f727acd..01d58d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index ef9c5f1..5ef164b 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1176,8 +1176,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1396,19 +1394,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- is_internal,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ is_internal,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1533,12 +1574,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1547,11 +1589,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName, true);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName, true);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 536d232..2a93596 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8726,7 +8726,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8734,6 +8733,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8780,7 +8781,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ RelationGetIndexList(toastRel);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8861,8 +8869,11 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index cb59f13..388685a 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index d589d26..86ab62a 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is evaluated based using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index aa6993a..125809f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2780,16 +2780,17 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2815,7 +2816,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index fd97141..ea46e38 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -93,7 +92,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 28
+#define Natts_pg_class 27
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -106,22 +105,21 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relfrozenxid 25
-#define Anum_pg_class_relminmxid 26
-#define Anum_pg_class_relacl 27
-#define Anum_pg_class_reloptions 28
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relfrozenxid 24
+#define Anum_pg_class_relminmxid 25
+#define Anum_pg_class_relacl 26
+#define Anum_pg_class_reloptions 27
/* ----------------
* initial contents of pg_class
@@ -136,13 +134,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 28 0 t f f f f 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..31309ed 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfValid
+ * Get index list of relation without recomputing it.
+ */
+#define RelationGetIndexListIfValid(rel) \
+do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+} while(0)
+
+/*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index a4ecfd2..7a68fb9 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index 4f49a0d..2d24961 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130328_2_reindex_concurrently_v25.patchapplication/octet-stream; name=20130328_2_reindex_concurrently_v25.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index db820d6..e77b058 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..a8b5fc9 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
@@ -139,6 +151,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +258,119 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +402,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 210ceda..73686f6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1089,6 +1099,438 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with exising names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, RowExclusiveLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
+/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * Note: a lower lock could be taken if catalog cache with SnapshotNow was
+ * correctly MVCC'd
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+{
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+void
+index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+{
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+}
+
+/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
/*
* index_constraint_create
*
@@ -1324,7 +1766,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1406,17 +1847,8 @@ index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
- /*
- * Mark index invalid by updating its pg_index entry
- */
- index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh any cached plans that might reference the
- * index.
- */
- CacheInvalidateRelcache(userHeapRelation);
+ /* Mark the index as invalid */
+ index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
@@ -1444,63 +1876,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
- */
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
@@ -1513,13 +1890,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
@@ -2990,27 +3361,32 @@ validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
- * flags that denote the index's state. We must use an in-place update of
- * the pg_index tuple, because we do not have exclusive lock on the parent
- * table and so other sessions might concurrently be doing SnapshotNow scans
- * of pg_index to identify the table's indexes. A transactional update would
- * risk somebody not seeing the index at all. Because the update is not
- * transactional and will not roll back on error, this must only be used as
- * the last step in a transaction that has not made any transactional catalog
- * updates!
+ * flags that denote the index's state. If this function is called in a
+ * concurrent process, we use an in-place update of the pg_index tuple,
+ * because we do not have exclusive lock on the parent table and so other
+ * sessions might concurrently be doing SnapshotNow scans of pg_index to
+ * identify the table's indexes. A transactional update would risk somebody
+ * not seeing the index at all. Because the update is not transactional
+ * and will not roll back on error, this must only be used as the last step
+ * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
-index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
+index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
- /* Assert that current xact hasn't done any transactional updates */
- Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
+ /*
+ * Assert that current xact hasn't done any transactional updates, there
+ * is nothing to worry in a non-concurrent context.
+ */
+ Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
@@ -3070,8 +3446,20 @@ index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
- /* ... and write it back in-place */
- heap_inplace_update(pg_index, indexTuple);
+ /*
+ * Write it back in-place in a concurrent context, and do a simple update
+ * for a non-concurrent context.
+ */
+ if (concurrent)
+ {
+ heap_inplace_update(pg_index, indexTuple);
+ }
+ else
+ {
+ simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
+ CommandCounterIncrement();
+ CatalogUpdateIndexes(pg_index, indexTuple);
+ }
heap_close(pg_index, RowExclusiveLock);
}
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f855bef..2ea997f 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -311,7 +312,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -320,13 +320,9 @@ DefineIndex(IndexStmt *stmt,
int16 *coloptions;
IndexInfo *indexInfo;
int numberOfAttributes;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -453,7 +449,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -600,7 +597,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -663,18 +660,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -694,34 +681,20 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
@@ -738,13 +711,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -773,79 +740,14 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(snapshot);
/*
* Index can now be marked valid -- update its pg_index entry
*/
- index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
+ index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
@@ -873,6 +775,544 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(snapshot);
+
+ /* we can now do away with our active snapshot */
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function in a non
+ * concurrent context.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1535,7 +1975,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1561,6 +2002,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1673,18 +2121,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1748,18 +2200,33 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
}
}
+
/*
* ReindexTable
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1778,7 +2245,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1790,6 +2260,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1873,15 +2352,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2a93596..0a15656 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -899,6 +899,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 11be62e..c46bdcc 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1185,6 +1185,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index fd3823a..27408b4 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3618,6 +3618,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 085cd5b..2687bf0 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1853,6 +1853,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0d82141..2d91451 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6761,29 +6761,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index 4308128..1662a6e 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2528,6 +2528,152 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given one,
+ * because is might not contain tuples deleted just before it has been taken.
+ * Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin of
+ * our reference snapshot; their oldest snapshot must be newer than ours.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(Snapshot snapshot)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(snapshot->xmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(snapshot->xmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index a1c03f1..6a0341b 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1292,16 +1292,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -1313,8 +1317,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..0693e3d 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,7 +60,28 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
+
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
+extern void index_concurrent_set_dead(Oid indexId,
+ Oid heapId,
+ LOCKTAG locktag);
+
+extern void index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent);
+
+extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
@@ -100,7 +121,9 @@ extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
-extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
+extern void index_set_state_flags(Oid indexId,
+ IndexStateFlagsAction action,
+ bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 62515b2..54137c6 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 2229ef0..bb3ae47 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2538,6 +2538,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index d5fdfea..d4a0981 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -76,4 +76,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(Snapshot snapshot);
+
#endif /* PROCARRAY_H */
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 2ae991e..23fff1f 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 914e7a5..a338794 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-03-28 10:18:45 +0900, Michael Paquier wrote:
On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Since we call relation_open() with lockmode, ISTM that we should also callrelation_close() with the same lockmode instead of NoLock. No?
Agreed on that.
That doesn't really hold true generally, its often sensible to hold the
lock till the end of the transaction, which is what not specifying a
lock at close implies.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-03-19 08:57:31 +0900, Michael Paquier wrote:
On Tue, Mar 19, 2013 at 3:24 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Mar 13, 2013 at 9:04 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:I have been working on improving the code of the 2 patches:
1) reltoastidxid removal:<snip>
- Fix a bug with pg_dump and binary upgrade. One valid index is necessary
for a given toast relation.Is this bugfix related to the following?
appendPQExpBuffer(upgrade_query, - "SELECT c.reltoastrelid, t.reltoastidxid " + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid " + "LIMIT 1",Yes.
Don't indisready and indislive need to be checked?
An index is valid if it is already ready and line. We could add such check
for safely but I don't think it is necessary.
Note that thats not true for 9.2. live && !ready represents isdead there, since
the need for that was only recognized after the release.
Greetings,
Andres Freund
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Mar 28, 2013 at 10:34 AM, Andres Freund <andres@anarazel.de> wrote:
On 2013-03-28 10:18:45 +0900, Michael Paquier wrote:
On Thu, Mar 28, 2013 at 3:12 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Since we call relation_open() with lockmode, ISTM that we should also callrelation_close() with the same lockmode instead of NoLock. No?
Agreed on that.
That doesn't really hold true generally, its often sensible to hold the
lock till the end of the transaction, which is what not specifying a
lock at close implies.
You're right. Even if we release the lock there, the lock is taken again soon
and hold till the end of the transaction. There is no need to release the lock
there.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I moved this patch to the next commit fest.
Thanks,
--
Michael
Hi all,
Please find attached the latest versions of REINDEX CONCURRENTLY for the
1st commit fest of 9.4:
- 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to
allow a toast relation to have multiple indexes running in parallel (extra
indexes could be created by a REINDEX CONCURRENTLY processed)
- 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
fixed a lock in index_concurrent_create on an index relation not released
at the end of a transaction
Those patches have been generated with context diffs...
Regards,
--
Michael
Attachments:
20130606_1_remove_reltoastidxid_v9.patchapplication/octet-stream; name=20130606_1_remove_reltoastidxid_v9.patchDownload
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
***************
*** 321,332 **** get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT reltoastidxid "
! "FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
--- 321,337 ----
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT indexrelid "
! "FROM pg_index "
! "WHERE indrelid IN (SELECT reltoastrelid "
! " FROM pg_class "
! " WHERE oid >= %u "
! " AND reltoastrelid != %u)",
! FirstNormalObjectId, InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 1745,1759 ****
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
--- 1745,1750 ----
*** a/doc/src/sgml/diskusage.sgml
--- b/doc/src/sgml/diskusage.sgml
***************
*** 44,50 ****
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
--- 44,50 ----
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
***************
*** 65,76 **** FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT reltoastidxid
! FROM pg_class
! WHERE oid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
--- 65,76 ----
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT indexrelid
! FROM pg_index
! WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
***************
*** 87,93 **** WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
--- 87,93 ----
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
***************
*** 101,107 **** SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
--- 101,107 ----
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***************
*** 76,86 **** do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
/* ----------
--- 76,88 ----
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel,
! Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+ static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
***************
*** 1237,1244 **** static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel;
! Relation toastidx;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
--- 1239,1246 ----
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
***************
*** 1257,1271 **** toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
--- 1259,1287 ----
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID. A toast table can have multiple identical
! * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
***************
*** 1330,1336 **** toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
}
else
--- 1346,1352 ----
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
***************
*** 1367,1373 **** toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
--- 1383,1390 ----
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid,
! RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
***************
*** 1384,1390 **** toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
--- 1401,1407 ----
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
***************
*** 1423,1438 **** toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! index_insert(toastidx, t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidx->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
--- 1440,1457 ----
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table for all the
! * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! for (i = 0; i < num_indexes; i++)
! index_insert(toastidxs[i], t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidxs[i]->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
***************
*** 1447,1456 **** toast_save_datum(Relation rel, Datum value,
}
/*
! * Done - close toast relation
*/
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
* Create the TOAST pointer value that we'll return
--- 1466,1477 ----
}
/*
! * Done - close toast relations
*/
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
***************
*** 1474,1484 **** toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
if (!VARATT_IS_EXTERNAL(attr))
return;
--- 1495,1508 ----
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
***************
*** 1487,1496 **** toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1511,1532 ----
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /*
! * We actually use only the first valid index but taking a lock on all is
! * necessary.
! */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1505,1511 **** toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1541,1547 ----
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1519,1526 **** toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
--- 1555,1564 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
***************
*** 1531,1541 **** toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1569,1596 ----
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1548,1554 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
--- 1603,1610 ----
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel,
! RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
***************
*** 1556,1561 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
--- 1612,1622 ----
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
***************
*** 1573,1579 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid);
heap_close(toastrel, AccessShareLock);
--- 1634,1640 ----
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
***************
*** 1591,1598 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
--- 1652,1659 ----
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
***************
*** 1607,1612 **** toast_fetch_datum(struct varlena * attr)
--- 1668,1676 ----
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
***************
*** 1622,1632 **** toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
--- 1686,1706 ----
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
***************
*** 1645,1651 **** toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1719,1725 ----
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1734,1741 **** toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 1808,1817 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
***************
*** 1750,1757 **** toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
--- 1826,1833 ----
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
***************
*** 1774,1779 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
--- 1850,1858 ----
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
***************
*** 1816,1826 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
--- 1895,1912 ----
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
***************
*** 1861,1867 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1947,1953 ----
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1958,1965 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 2044,2079 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+ /* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+ static Relation
+ toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+ {
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+ }
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 781,787 **** InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
--- 781,786 ----
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 103,109 **** static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
--- 103,109 ----
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
***************
*** 1072,1078 **** index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
--- 1072,1077 ----
***************
*** 1254,1260 **** index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
--- 1253,1258 ----
***************
*** 1764,1771 **** FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
--- 1762,1767 ----
***************
*** 1781,1788 **** FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
! bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
--- 1777,1785 ----
*/
static void
index_update_stats(Relation rel,
! bool hasindex,
! bool isprimary,
! double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
***************
*** 1876,1890 **** index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
--- 1873,1878 ----
***************
*** 2072,2085 **** index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
--- 2060,2070 ----
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 473,488 **** CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! pg_stat_get_blocks_fetched(X.oid) -
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_class X ON T.reltoastidxid = X.oid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
--- 473,488 ----
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! pg_stat_get_blocks_fetched(X.indrelid) -
! pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
! pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 1172,1179 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
--- 1172,1177 ----
***************
*** 1392,1410 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
! * If we're swapping two toast tables by content, do the same for their
! * indexes.
*/
if (swap_toast_by_content &&
! relform1->reltoastidxid && relform2->reltoastidxid)
! swap_relation_files(relform1->reltoastidxid,
! relform2->reltoastidxid,
! target_is_pg_class,
! swap_toast_by_content,
! is_internal,
! InvalidTransactionId,
! InvalidMultiXactId,
! mapped_tables);
/* Clean up. */
heap_freetuple(reltup1);
--- 1390,1451 ----
}
/*
! * If we're swapping two toast tables by content, do the same for all of
! * their indexes. The swap can actually be safely done only if the
! * relations have indexes.
*/
if (swap_toast_by_content &&
! relform1->reltoastrelid &&
! relform2->reltoastrelid)
! {
! Relation toastRel1, toastRel2;
!
! /* Open relations */
! toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
! toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
!
! /* Obtain index list */
! RelationGetIndexList(toastRel1);
! RelationGetIndexList(toastRel2);
!
! /* Check if the swap is possible for all the toast indexes */
! if (list_length(toastRel1->rd_indexlist) == 1 &&
! list_length(toastRel2->rd_indexlist) == 1)
! {
! ListCell *lc1, *lc2;
!
! /* Now swap each couple */
! lc2 = list_head(toastRel2->rd_indexlist);
! foreach(lc1, toastRel1->rd_indexlist)
! {
! Oid indexOid1 = lfirst_oid(lc1);
! Oid indexOid2 = lfirst_oid(lc2);
! swap_relation_files(indexOid1,
! indexOid2,
! target_is_pg_class,
! swap_toast_by_content,
! is_internal,
! InvalidTransactionId,
! InvalidMultiXactId,
! mapped_tables);
! lc2 = lnext(lc2);
! }
! }
! else
! {
! /*
! * As this code path is only taken by shared catalogs, who cannot
! * have multiple indexes on their toast relation, simply return
! * an error.
! */
! ereport(ERROR,
! (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
! errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
! }
!
! heap_close(toastRel1, AccessExclusiveLock);
! heap_close(toastRel2, AccessExclusiveLock);
! }
/* Clean up. */
heap_freetuple(reltup1);
***************
*** 1529,1540 **** finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
! toastidx = toastrel->rd_rel->reltoastidxid;
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
--- 1570,1582 ----
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
! RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
***************
*** 1543,1553 **** finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
! /* ... and its index too */
! snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
! OIDOldHeap);
! RenameRelationInternal(toastidx,
! NewToastName, true);
}
relation_close(newrel, NoLock);
}
--- 1585,1607 ----
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
! /* ... and its indexes too */
! foreach(lc, toastrel->rd_indexlist)
! {
! /*
! * The first index keeps the former toast name and the
! * following entries have a suffix appended.
! */
! if (count == 0)
! snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
! OIDOldHeap);
! else
! snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
! OIDOldHeap, count);
! RenameRelationInternal(lfirst_oid(lc),
! NewToastName, true);
! count++;
! }
}
relation_close(newrel, NoLock);
}
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 8728,8734 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
--- 8728,8733 ----
***************
*** 8736,8741 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
--- 8735,8742 ----
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
***************
*** 8782,8788 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
! reltoastidxid = rel->rd_rel->reltoastidxid;
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
--- 8783,8795 ----
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
! /* Fetch the list of indexes on toast relation if necessary */
! if (OidIsValid(reltoastrelid))
! {
! Relation toastRel = relation_open(reltoastrelid, lockmode);
! reltoastidxids = RelationGetIndexList(toastRel);
! relation_close(toastRel, lockmode);
! }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
***************
*** 8863,8870 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
! if (OidIsValid(reltoastidxid))
! ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
}
/*
--- 8870,8884 ----
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
! foreach(lc, reltoastidxids)
! {
! Oid toastidxid = lfirst_oid(lc);
! if (OidIsValid(toastidxid))
! ATExecSetTableSpace(toastidxid, newTableSpace, lockmode);
! }
!
! /* Clean up */
! list_free(reltoastidxids);
}
/*
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
***************
*** 575,582 **** DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid/reltoastidxid of
! * the toast table we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
--- 575,582 ----
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid of the toast table
! * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
***************
*** 588,594 **** DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
--- 588,593 ----
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***************
*** 332,338 **** pg_relation_size(PG_FUNCTION_ARGS)
}
/*
! * Calculate total on-disk size of a TOAST relation, including its index.
* Must not be applied to non-TOAST relations.
*/
static int64
--- 332,338 ----
}
/*
! * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
***************
*** 340,347 **** calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
toastRel = relation_open(toastrelid, AccessShareLock);
--- 340,347 ----
{
int64 size = 0;
Relation toastRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
***************
*** 351,362 **** calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
! relation_close(toastIdxRel, AccessShareLock);
relation_close(toastRel, AccessShareLock);
return size;
--- 351,370 ----
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! RelationGetIndexList(toastRel);
! /* Size is calculated using all the indexes available */
! foreach(lc, toastRel->rd_indexlist)
! {
! Relation toastIdxRel;
! toastIdxRel = relation_open(lfirst_oid(lc),
! AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
!
! relation_close(toastIdxRel, AccessShareLock);
! }
relation_close(toastRel, AccessShareLock);
return size;
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 2781,2796 **** binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.reltoastidxid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
! "WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
--- 2781,2797 ----
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
! "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
! "LIMIT 1",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
***************
*** 2816,2822 **** binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
--- 2817,2823 ----
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
*** a/src/include/catalog/pg_class.h
--- b/src/include/catalog/pg_class.h
***************
*** 48,54 **** CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
--- 48,53 ----
***************
*** 94,100 **** typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
! #define Natts_pg_class 29
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
--- 93,99 ----
* ----------------
*/
! #define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
***************
*** 107,129 **** typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
! #define Anum_pg_class_reltoastidxid 13
! #define Anum_pg_class_relhasindex 14
! #define Anum_pg_class_relisshared 15
! #define Anum_pg_class_relpersistence 16
! #define Anum_pg_class_relkind 17
! #define Anum_pg_class_relnatts 18
! #define Anum_pg_class_relchecks 19
! #define Anum_pg_class_relhasoids 20
! #define Anum_pg_class_relhaspkey 21
! #define Anum_pg_class_relhasrules 22
! #define Anum_pg_class_relhastriggers 23
! #define Anum_pg_class_relhassubclass 24
! #define Anum_pg_class_relispopulated 25
! #define Anum_pg_class_relfrozenxid 26
! #define Anum_pg_class_relminmxid 27
! #define Anum_pg_class_relacl 28
! #define Anum_pg_class_reloptions 29
/* ----------------
* initial contents of pg_class
--- 106,127 ----
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
! #define Anum_pg_class_relhasindex 13
! #define Anum_pg_class_relisshared 14
! #define Anum_pg_class_relpersistence 15
! #define Anum_pg_class_relkind 16
! #define Anum_pg_class_relnatts 17
! #define Anum_pg_class_relchecks 18
! #define Anum_pg_class_relhasoids 19
! #define Anum_pg_class_relhaspkey 20
! #define Anum_pg_class_relhasrules 21
! #define Anum_pg_class_relhastriggers 22
! #define Anum_pg_class_relhassubclass 23
! #define Anum_pg_class_relispopulated 24
! #define Anum_pg_class_relfrozenxid 25
! #define Anum_pg_class_relminmxid 26
! #define Anum_pg_class_relacl 27
! #define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
***************
*** 138,150 **** typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
! DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
--- 136,148 ----
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
! DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
*** a/src/include/utils/relcache.h
--- b/src/include/utils/relcache.h
***************
*** 29,34 **** typedef struct RelationData *Relation;
--- 29,44 ----
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfValid
+ * Get index list of relation without recomputing it.
+ */
+ #define RelationGetIndexListIfValid(rel) \
+ do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+ } while(0)
+
+ /*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
*** a/src/test/regress/expected/oidjoins.out
--- b/src/test/regress/expected/oidjoins.out
***************
*** 353,366 **** WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
- ------+---------------
- (0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 353,358 ----
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
***************
*** 1852,1866 **** SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
! | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
--- 1852,1866 ----
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
! | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
***************
*** 2347,2357 **** select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | reltoastidxid | relkind | relfrozenxid
! ---------------+---------------+---------+--------------
! 0 | 0 | v | 0
(1 row)
drop view fooview;
--- 2347,2357 ----
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | relkind | relfrozenxid
! ---------------+---------+--------------
! 0 | v | 0
(1 row)
drop view fooview;
*** a/src/test/regress/sql/oidjoins.sql
--- b/src/test/regress/sql/oidjoins.sql
***************
*** 177,186 **** SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 177,182 ----
*** a/src/test/regress/sql/rules.sql
--- b/src/test/regress/sql/rules.sql
***************
*** 872,878 **** create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
--- 872,878 ----
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
*** a/src/tools/findoidjoins/README
--- b/src/tools/findoidjoins/README
***************
*** 86,92 **** Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
- Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
--- 86,91 ----
20130606_2_reindex_concurrently_v26.patchapplication/octet-stream; name=20130606_2_reindex_concurrently_v26.patchDownload
*** a/doc/src/sgml/mvcc.sgml
--- b/doc/src/sgml/mvcc.sgml
***************
*** 863,870 **** ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
! <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
! some forms of <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
--- 863,871 ----
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
! <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
! <command>REINDEX CONCURRENTLY</> and some forms of
! <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
*** a/doc/src/sgml/ref/reindex.sgml
--- b/doc/src/sgml/ref/reindex.sgml
***************
*** 21,27 **** PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
! REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
--- 21,27 ----
<refsynopsisdiv>
<synopsis>
! REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
***************
*** 68,76 **** REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
! <command>REINDEX</> will not perform a concurrent build. To build the
! index without interfering with production you should drop the index and
! reissue the <command>CREATE INDEX CONCURRENTLY</> command.
</para>
</listitem>
--- 68,88 ----
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
! <command>REINDEX</> will perform a concurrent build if <literal>
! CONCURRENTLY</> is specified. To build the index without interfering
! with production you should drop the index and reissue either the
! <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
! command. Indexes of toast relations can be rebuilt with <command>REINDEX
! CONCURRENTLY</>.
! </para>
! </listitem>
!
! <listitem>
! <para>
! Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
! EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
! DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
! using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
***************
*** 139,144 **** REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
--- 151,171 ----
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
***************
*** 231,236 **** REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
--- 258,376 ----
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+ <programlisting>
+ postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+ --------+---------+-----------
+ col | integer |
+ Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+ </programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
***************
*** 262,268 **** $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
! </programlisting></para>
</refsect1>
<refsect1>
--- 402,419 ----
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
! </programlisting>
! </para>
!
! <para>
! Rebuild a table while authorizing read and write operations on involved
! relations when performed:
!
! <programlisting>
! REINDEX TABLE CONCURRENTLY my_broken_table;
! </programlisting>
! </para>
!
</refsect1>
<refsect1>
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 43,51 ****
--- 43,53 ----
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+ #include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+ #include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
***************
*** 672,677 **** UpdateIndexRelation(Oid indexoid,
--- 674,683 ----
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
***************
*** 695,701 **** index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
--- 701,708 ----
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal,
! bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
***************
*** 738,756 **** index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
! * release locks before committing in catalogs
*/
if (concurrent &&
! IsSystemRelation(heapRelation))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
! * This case is currently not supported, but there's no way to ask for it
! * in the grammar anyway, so it can't happen.
*/
! if (concurrent && is_exclusion)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
--- 745,766 ----
/*
* concurrent index build on a system catalog is unsafe because we tend to
! * release locks before committing in catalogs. If the index is created during
! * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
! IsSystemRelation(heapRelation) &&
! !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
! * This case is currently only supported during a concurrent index
! * rebuild, but there is no way to ask for it in the grammar otherwise
! * anyway.
*/
! if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
***************
*** 1090,1095 **** index_create(Relation heapRelation,
--- 1100,1537 ----
return indexRelationId;
}
+
+ /*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+ Oid
+ index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+ {
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with existing names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, NoLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+ }
+
+
+ /*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+ void
+ index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+ {
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+ }
+
+
+ /*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * Note: a lower lock could be taken if catalog cache with SnapshotNow was
+ * correctly MVCC'd.
+ */
+ void
+ index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+ {
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+ }
+
+ /*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+ void
+ index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+ {
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+ }
+
+ /*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+ void
+ index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+ {
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+ }
+
+ /*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+ void
+ index_concurrent_drop(Oid indexOid)
+ {
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+ }
+
+
/*
* index_constraint_create
*
***************
*** 1325,1331 **** index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
--- 1767,1772 ----
***************
*** 1407,1423 **** index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
! /*
! * Mark index invalid by updating its pg_index entry
! */
! index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
!
! /*
! * Invalidate the relcache for the table, so that after this commit
! * all sessions will refresh any cached plans that might reference the
! * index.
! */
! CacheInvalidateRelcache(userHeapRelation);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
--- 1848,1855 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
! /* Mark the index as invalid */
! index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
***************
*** 1445,1507 **** index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
! /*
! * Now we must wait until no running transaction could be using the
! * index for a query. To do this, inquire which xacts currently would
! * conflict with AccessExclusiveLock on the table -- ie, which ones
! * have a lock of any kind on the table. Then wait for each of these
! * xacts to commit or abort. Note we do not need to worry about xacts
! * that open the table for reading after this point; they will see the
! * index as invalid when they open the relation.
! *
! * Note: the reason we use actual lock acquisition here, rather than
! * just checking the ProcArray and sleeping, is that deadlock is
! * possible if one of the transactions in question is blocked trying
! * to acquire an exclusive lock on our table. The lock code will
! * detect deadlock and error out properly.
! *
! * Note: GetLockConflicts() never reports our own xid, hence we need
! * not check for that. Also, prepared xacts are not reported, which
! * is fine since they certainly aren't going to do anything more.
! */
! old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
!
! /*
! * No more predicate locks will be acquired on this index, and we're
! * about to stop doing inserts into the index which could show
! * conflicts with existing predicate locks, so now is the time to move
! * them to the heap relation.
! */
! userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
! userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
! TransferPredicateLocksToHeapRelation(userIndexRelation);
!
! /*
! * Now we are sure that nobody uses the index for queries; they just
! * might have it open for updating it. So now we can unset indisready
! * and indislive, then wait till nobody could be using it at all
! * anymore.
! */
! index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
!
! /*
! * Invalidate the relcache for the table, so that after this commit
! * all sessions will refresh the table's index list. Forgetting just
! * the index's relcache entry is not enough.
! */
! CacheInvalidateRelcache(userHeapRelation);
!
! /*
! * Close the relations again, though still holding session lock.
! */
! heap_close(userHeapRelation, NoLock);
! index_close(userIndexRelation, NoLock);
/*
* Again, commit the transaction to make the pg_index update visible
--- 1877,1884 ----
CommitTransactionCommand();
StartTransactionCommand();
! /* Finish invalidation of index and mark it as dead */
! index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
***************
*** 1514,1526 **** index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
! old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
/*
* Re-open relations to allow us to complete our actions.
--- 1891,1897 ----
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
! WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
***************
*** 2991,3017 **** validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
! * flags that denote the index's state. We must use an in-place update of
! * the pg_index tuple, because we do not have exclusive lock on the parent
! * table and so other sessions might concurrently be doing SnapshotNow scans
! * of pg_index to identify the table's indexes. A transactional update would
! * risk somebody not seeing the index at all. Because the update is not
! * transactional and will not roll back on error, this must only be used as
! * the last step in a transaction that has not made any transactional catalog
! * updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
! index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
! /* Assert that current xact hasn't done any transactional updates */
! Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
--- 3362,3393 ----
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
! * flags that denote the index's state. If this function is called in a
! * concurrent process, we use an in-place update of the pg_index tuple,
! * because we do not have exclusive lock on the parent table and so other
! * sessions might concurrently be doing SnapshotNow scans of pg_index to
! * identify the table's indexes. A transactional update would risk somebody
! * not seeing the index at all. Because the update is not transactional
! * and will not roll back on error, this must only be used as the last step
! * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
! index_set_state_flags(Oid indexId,
! IndexStateFlagsAction action,
! bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
! /*
! * Assert that current xact hasn't done any transactional updates, there
! * is nothing to worry in a non-concurrent context.
! */
! Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
***************
*** 3071,3078 **** index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
! /* ... and write it back in-place */
! heap_inplace_update(pg_index, indexTuple);
heap_close(pg_index, RowExclusiveLock);
}
--- 3447,3466 ----
break;
}
! /*
! * Write it back in-place in a concurrent context, and do a simple update
! * for a non-concurrent context.
! */
! if (concurrent)
! {
! heap_inplace_update(pg_index, indexTuple);
! }
! else
! {
! simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
! CommandCounterIncrement();
! CatalogUpdateIndexes(pg_index, indexTuple);
! }
heap_close(pg_index, RowExclusiveLock);
}
*** a/src/backend/catalog/toasting.c
--- b/src/backend/catalog/toasting.c
***************
*** 281,287 **** create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
! true, false, false, true);
heap_close(toast_rel, NoLock);
--- 281,287 ----
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
! true, false, false, false, false);
heap_close(toast_rel, NoLock);
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***************
*** 68,75 **** static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
! List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
--- 68,76 ----
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
! List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint,
! bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
***************
*** 311,317 **** DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
--- 312,317 ----
***************
*** 321,333 **** DefineIndex(IndexStmt *stmt,
IndexInfo *indexInfo;
int numberOfAttributes;
TransactionId limitXmin;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
--- 321,329 ----
***************
*** 454,460 **** DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
! stmt->isconstraint);
/*
* look up the access method, verify it can handle the requested features
--- 450,457 ----
indexColNames,
stmt->excludeOpNames,
stmt->primary,
! stmt->isconstraint,
! false);
/*
* look up the access method, verify it can handle the requested features
***************
*** 601,607 **** DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
! stmt->concurrent, !check_rights);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
--- 598,604 ----
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
! stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
***************
*** 664,681 **** DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
! old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
/*
* At this moment we are sure that there are no transactions with the
--- 661,668 ----
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
*/
! WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
***************
*** 695,728 **** DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
! /* We have to re-build the IndexInfo struct, since it was lost in commit */
! indexInfo = BuildIndexInfo(indexRelation);
! Assert(!indexInfo->ii_ReadyForInserts);
! indexInfo->ii_Concurrent = true;
! indexInfo->ii_BrokenHotChain = false;
!
! /* Now build the index */
! index_build(rel, indexRelation, indexInfo, stmt->primary, false);
!
! /* Close both the relations, but keep the locks */
! heap_close(rel, NoLock);
! index_close(indexRelation, NoLock);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
/* we can do away with our snapshot */
PopActiveSnapshot();
--- 682,701 ----
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
! /* Perform concurrent build of index */
! index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
! indexRelationId,
! stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
***************
*** 739,751 **** DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
! old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
/*
* Now take the "reference snapshot" that will be used by validate_index()
--- 712,718 ----
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
! WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
***************
*** 786,864 **** DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
! * transactions that might have older snapshots. Obtain a list of VXIDs
! * of such transactions, and wait for them individually.
! *
! * We can exclude any running transactions that have xmin > the xmin of
! * our reference snapshot; their oldest snapshot must be newer than ours.
! * We can also exclude any transactions that have xmin = zero, since they
! * evidently have no live snapshot at all (and any one they might be in
! * process of taking is certainly newer than ours). Transactions in other
! * DBs can be ignored too, since they'll never even be able to see this
! * index.
! *
! * We can also exclude autovacuum processes and processes running manual
! * lazy VACUUMs, because they won't be fazed by missing index entries
! * either. (Manual ANALYZEs, however, can't be excluded because they
! * might be within transactions that are going to do arbitrary operations
! * later.)
! *
! * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
! * check for that.
! *
! * If a process goes idle-in-transaction with xmin zero, we do not need to
! * wait for it anymore, per the above argument. We do not have the
! * infrastructure right now to stop waiting if that happens, but we can at
! * least avoid the folly of waiting when it is idle at the time we would
! * begin to wait. We do this by repeatedly rechecking the output of
! * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
! * doesn't show up in the output, we know we can forget about it.
*/
! old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
! PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
! &n_old_snapshots);
!
! for (i = 0; i < n_old_snapshots; i++)
! {
! if (!VirtualTransactionIdIsValid(old_snapshots[i]))
! continue; /* found uninteresting in previous cycle */
!
! if (i > 0)
! {
! /* see if anything's changed ... */
! VirtualTransactionId *newer_snapshots;
! int n_newer_snapshots;
! int j;
! int k;
!
! newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
! true, false,
! PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
! &n_newer_snapshots);
! for (j = i; j < n_old_snapshots; j++)
! {
! if (!VirtualTransactionIdIsValid(old_snapshots[j]))
! continue; /* found uninteresting in previous cycle */
! for (k = 0; k < n_newer_snapshots; k++)
! {
! if (VirtualTransactionIdEquals(old_snapshots[j],
! newer_snapshots[k]))
! break;
! }
! if (k >= n_newer_snapshots) /* not there anymore */
! SetInvalidVirtualTransactionId(old_snapshots[j]);
! }
! pfree(newer_snapshots);
! }
!
! if (VirtualTransactionIdIsValid(old_snapshots[i]))
! VirtualXactLock(old_snapshots[i], true);
! }
/*
* Index can now be marked valid -- update its pg_index entry
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
/*
* The pg_index update will cause backends (including this one) to update
--- 753,766 ----
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
! * transactions that might have older snapshots.
*/
! WaitForOldSnapshots(limitXmin);
/*
* Index can now be marked valid -- update its pg_index entry
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
***************
*** 880,885 **** DefineIndex(IndexStmt *stmt,
--- 782,1331 ----
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+ bool
+ ReindexRelationConcurrently(Oid relationOid)
+ {
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+ TransactionId limitXmin;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * We can now do away with our active snapshot, we still need to save the xmin
+ * limit to wait for older snapshots.
+ */
+ limitXmin = snapshot->xmin;
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(limitXmin);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function as it would
+ * be for a non-concurrent context.
+ * Note: With MVCC catalog access, a lower lock would be enough.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+ }
+
+
+ /*
* CheckMutability
* Test whether given expression is mutable
*/
***************
*** 1542,1548 **** ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint)
{
char *indexname;
--- 1988,1995 ----
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint,
! bool concurrent)
{
char *indexname;
***************
*** 1568,1573 **** ChooseIndexName(const char *tabname, Oid namespaceId,
--- 2015,2027 ----
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
***************
*** 1680,1697 **** ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
! ReindexIndex(RangeVar *indexRelation)
{
Oid indOid;
Oid heapOid = InvalidOid;
! /* lock level used here should match index lock reindex_index() */
! indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
! false, false,
! RangeVarCallbackForReindexIndex,
! (void *) &heapOid);
! reindex_index(indOid, false);
return indOid;
}
--- 2134,2155 ----
* Recreate a specific index.
*/
Oid
! ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
! indOid = RangeVarGetRelidExtended(indexRelation,
! concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
! false, false,
! RangeVarCallbackForReindexIndex,
! (void *) &heapOid);
! /* Continue process for concurrent or non-concurrent case */
! if (!concurrent)
! reindex_index(indOid, false);
! else
! ReindexRelationConcurrently(indOid);
return indOid;
}
***************
*** 1760,1772 **** RangeVarCallbackForReindexIndex(const RangeVar *relation,
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
! ReindexTable(RangeVar *relation)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
! heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
! RangeVarCallbackOwnsTable, NULL);
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
--- 2218,2244 ----
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
! ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
! heapOid = RangeVarGetRelidExtended(relation,
! concurrent ? ShareUpdateExclusiveLock : ShareLock,
! false, false,
! RangeVarCallbackOwnsTable, NULL);
!
! /* Run through the concurrent process if necessary */
! if (concurrent)
! {
! if (!ReindexRelationConcurrently(heapOid))
! {
! ereport(NOTICE,
! (errmsg("table \"%s\" has no indexes",
! relation->relname)));
! }
! return heapOid;
! }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
***************
*** 1785,1791 **** ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
! ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
{
Relation relationRelation;
HeapScanDesc scan;
--- 2257,2266 ----
* That means this must not be called within a user transaction block!
*/
Oid
! ReindexDatabase(const char *databaseName,
! bool do_system,
! bool do_user,
! bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
***************
*** 1797,1802 **** ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
--- 2272,2286 ----
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
***************
*** 1880,1894 **** ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
! if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
! (errmsg("table \"%s.%s\" was reindexed",
get_namespace_name(get_rel_namespace(relid)),
! get_rel_name(relid))));
PopActiveSnapshot();
CommitTransactionCommand();
}
--- 2364,2403 ----
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
!
! /* Determine if relation needs to be processed concurrently */
! process_concurrent = concurrent &&
! !IsSystemNamespace(get_rel_namespace(relid));
!
! /*
! * Reindex relation with a concurrent or non-concurrent process.
! * System relations cannot be reindexed concurrently, but they
! * need to be reindexed including pg_class with a normal process
! * as they could be corrupted, and concurrent process might also
! * use them. This does not include toast relations, which are
! * reindexed when their parent relation is processed.
! */
! if (process_concurrent)
! {
! old = MemoryContextSwitchTo(private_context);
! result = ReindexRelationConcurrently(relid);
! MemoryContextSwitchTo(old);
! }
! else
! result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
!
! if (result)
ereport(NOTICE,
! (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
! get_rel_name(relid),
! process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 900,905 **** RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
--- 900,937 ----
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
*** a/src/backend/executor/execUtils.c
--- b/src/backend/executor/execUtils.c
***************
*** 1201,1206 **** check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
--- 1201,1220 ----
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***************
*** 3617,3622 **** _copyReindexStmt(const ReindexStmt *from)
--- 3617,3623 ----
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***************
*** 1839,1844 **** _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
--- 1839,1845 ----
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***************
*** 6752,6780 **** opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
! REINDEX reindex_type qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
! n->relation = $3;
n->name = NULL;
$$ = (Node *)n;
}
! | REINDEX SYSTEM_P name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->name = $3;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
! | REINDEX DATABASE name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->name = $3;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
--- 6752,6783 ----
*****************************************************************************/
ReindexStmt:
! REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
! n->concurrent = $3;
! n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
! | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->concurrent = $3;
! n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
! | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->concurrent = $3;
! n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 2528,2533 **** XidCacheRemoveRunningXids(TransactionId xid,
--- 2528,2679 ----
LWLockRelease(ProcArrayLock);
}
+
+ /*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+ void
+ WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+ {
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+ }
+
+
+ /*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+ void
+ WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+ {
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+ }
+
+
+ /*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given xmin
+ * limit, because it might not contain tuples deleted just before it has
+ * been taken. Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin given;
+ * their oldest snapshot must be newer than our xmin limit.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+ void
+ WaitForOldSnapshots(TransactionId limitXmin)
+ {
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+ }
+
+
#ifdef XIDCACHE_DEBUG
/*
*** a/src/backend/tcop/utility.c
--- b/src/backend/tcop/utility.c
***************
*** 778,793 **** standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
! ReindexIndex(stmt->relation);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
! ReindexTable(stmt->relation);
break;
case OBJECT_DATABASE:
--- 778,797 ----
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
! ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
! ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
***************
*** 799,806 **** standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
! ReindexDatabase(stmt->name,
! stmt->do_system, stmt->do_user);
break;
default:
elog(ERROR, "unrecognized object type: %d",
--- 803,810 ----
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
! ReindexDatabase(stmt->name, stmt->do_system,
! stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
*** a/src/include/catalog/index.h
--- b/src/include/catalog/index.h
***************
*** 60,66 **** extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
--- 60,87 ----
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal,
! bool is_reindex);
!
! extern Oid index_concurrent_create(Relation heapRelation,
! Oid indOid,
! char *concurrentName);
!
! extern void index_concurrent_build(Oid heapOid,
! Oid indexOid,
! bool isprimary);
!
! extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
!
! extern void index_concurrent_set_dead(Oid indexId,
! Oid heapId,
! LOCKTAG locktag);
!
! extern void index_concurrent_clear_valid(Relation heapRelation,
! Oid indexOid,
! bool concurrent);
!
! extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
***************
*** 100,106 **** extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
! extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
--- 121,129 ----
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
! extern void index_set_state_flags(Oid indexId,
! IndexStateFlagsAction action,
! bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
*** a/src/include/commands/defrem.h
--- b/src/include/commands/defrem.h
***************
*** 26,35 **** extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
! extern Oid ReindexIndex(RangeVar *indexRelation);
! extern Oid ReindexTable(RangeVar *relation);
extern Oid ReindexDatabase(const char *databaseName,
! bool do_system, bool do_user);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
--- 26,36 ----
bool check_rights,
bool skip_build,
bool quiet);
! extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
! extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
! bool do_system, bool do_user, bool concurrent);
! extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
*** a/src/include/nodes/parsenodes.h
--- b/src/include/nodes/parsenodes.h
***************
*** 2538,2543 **** typedef struct ReindexStmt
--- 2538,2544 ----
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
***************
*** 76,79 **** extern void XidCacheRemoveRunningXids(TransactionId xid,
--- 76,83 ----
int nxids, const TransactionId *xids,
TransactionId latestXid);
+ extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+ extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+ extern void WaitForOldSnapshots(TransactionId limitXmin);
+
#endif /* PROCARRAY_H */
*** a/src/test/regress/expected/create_index.out
--- b/src/test/regress/expected/create_index.out
***************
*** 2721,2723 **** ORDER BY thousand;
--- 2721,2778 ----
1 | 1001
(2 rows)
+ --
+ -- Check behavior of REINDEX and REINDEX CONCURRENTLY
+ --
+ CREATE TABLE concur_reindex_tab (c1 int);
+ -- REINDEX
+ REINDEX TABLE concur_reindex_tab; -- notice
+ NOTICE: table "concur_reindex_tab" has no indexes
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ NOTICE: table "concur_reindex_tab" has no indexes
+ ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+ -- Normal index with integer column
+ CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+ -- Normal index with text column
+ CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+ -- UNIQUE index with expression
+ CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+ -- Duplicate column names
+ CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+ -- Create table for check on foreign key dependence switch with indexes swapped
+ ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+ CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+ INSERT INTO concur_reindex_tab VALUES (1, 'a');
+ INSERT INTO concur_reindex_tab VALUES (2, 'a');
+ -- Check materialized views
+ CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+ REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+ -- Check errors
+ -- Cannot run inside a transaction block
+ BEGIN;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+ COMMIT;
+ REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ ERROR: concurrent reindex is not supported for shared relations
+ REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ ERROR: cannot reindex system concurrently
+ -- Check the relation status, there should not be invalid indexes
+ \d concur_reindex_tab
+ Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+ --------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+ Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+ Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+ DROP MATERIALIZED VIEW concur_reindex_matview;
+ DROP TABLE concur_reindex_tab, concur_reindex_tab2;
*** a/src/test/regress/sql/create_index.sql
--- b/src/test/regress/sql/create_index.sql
***************
*** 912,914 **** ORDER BY thousand;
--- 912,954 ----
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+ --
+ -- Check behavior of REINDEX and REINDEX CONCURRENTLY
+ --
+ CREATE TABLE concur_reindex_tab (c1 int);
+ -- REINDEX
+ REINDEX TABLE concur_reindex_tab; -- notice
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+ -- Normal index with integer column
+ CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+ -- Normal index with text column
+ CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+ -- UNIQUE index with expression
+ CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+ -- Duplicate column names
+ CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+ -- Create table for check on foreign key dependence switch with indexes swapped
+ ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+ CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+ INSERT INTO concur_reindex_tab VALUES (1, 'a');
+ INSERT INTO concur_reindex_tab VALUES (2, 'a');
+ -- Check materialized views
+ CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+ REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+ -- Check errors
+ -- Cannot run inside a transaction block
+ BEGIN;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ COMMIT;
+ REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+ -- Check the relation status, there should not be invalid indexes
+ \d concur_reindex_tab
+ DROP MATERIALIZED VIEW concur_reindex_matview;
+ DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Hi all,
Please find attached the latest versions of REINDEX CONCURRENTLY for the 1st
commit fest of 9.4:
- 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to allow
a toast relation to have multiple indexes running in parallel (extra indexes
could be created by a REINDEX CONCURRENTLY processed)
- 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
fixed a lock in index_concurrent_create on an index relation not released at
the end of a transaction
Could you let me know how this patch has something to do with MVCC catalog
access patch? Should we wait for MVCC catalog access patch to be committed
before starting to review this patch?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Hi all,
Please find attached the latest versions of REINDEX CONCURRENTLY for the 1st
commit fest of 9.4:
- 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to allow
a toast relation to have multiple indexes running in parallel (extra indexes
could be created by a REINDEX CONCURRENTLY processed)
- 20130606_2_reindex_concurrently_v26.patch, correcting some comments and
fixed a lock in index_concurrent_create on an index relation not released at
the end of a transactionCould you let me know how this patch has something to do with MVCC catalog
access patch? Should we wait for MVCC catalog access patch to be committed
before starting to review this patch?
I wondered the same. The MVCC catalog patch, if applied, would make it
possible to make the actual relfilenode swap concurrently instead of
requiring to take access exlusive locks which obviously is way nicer. On
the other hand, that function is only a really small part of this patch,
so it seems quite possible to make another pass at it before relying on
mvcc catalog scans.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 17, 2013 at 5:23 AM, Andres Freund <andres@2ndquadrant.com>wrote:
On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Hi all,
Please find attached the latest versions of REINDEX CONCURRENTLY for
the 1st
commit fest of 9.4:
- 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, toallow
a toast relation to have multiple indexes running in parallel (extra
indexes
could be created by a REINDEX CONCURRENTLY processed)
- 20130606_2_reindex_concurrently_v26.patch, correcting some commentsand
fixed a lock in index_concurrent_create on an index relation not
released at
the end of a transaction
Could you let me know how this patch has something to do with MVCC
catalog
access patch? Should we wait for MVCC catalog access patch to be
committed
before starting to review this patch?
I wondered the same. The MVCC catalog patch, if applied, would make it
possible to make the actual relfilenode swap concurrently instead of
requiring to take access exlusive locks which obviously is way nicer. On
the other hand, that function is only a really small part of this patch,
so it seems quite possible to make another pass at it before relying on
mvcc catalog scans.
As mentionned by Andres, the only thing that the MVCC catalog patch can
improve here
is the index swap phase (index_concurrent_swap:index.c) where the
relfilenode of the
old and new indexes are exchanged. Now an AccessExclusiveLock is taken on
the 2 relations
being swap, we could leverage that to ShareUpdateExclusiveLock with the
MVCC catalog
access I think.
Also, with the MVCC catalog patch in, we could add some isolation tests for
REINDEX CONCURRENTLY (there were some tests in one of the previous
versions),
what is currently not possible due to the exclusive lock taken at swap
phase.
Btw, those are minor things in the patch, so I think that it would be
better to not wait
for the MVCC catalog patch. Even if you think that it would be better to
wait for it,
you could even begin with the 1st patch allowing a toast relation to have
multiple
indexes (removal of reltoastidxid) which does not depend at all on it.
Thanks,
--
Michael
On 6/17/13 8:23 AM, Michael Paquier wrote:
As mentionned by Andres, the only thing that the MVCC catalog patch can
improve here
is the index swap phase (index_concurrent_swap:index.c) where the
relfilenode of the
old and new indexes are exchanged. Now an AccessExclusiveLock is taken
on the 2 relations
being swap, we could leverage that to ShareUpdateExclusiveLock with the
MVCC catalog
access I think.
Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
not really concurrent, at least not concurrent to the standard set by
CREATE and DROP INDEX CONCURRENTLY.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-17 09:12:12 -0400, Peter Eisentraut wrote:
On 6/17/13 8:23 AM, Michael Paquier wrote:
As mentionned by Andres, the only thing that the MVCC catalog patch can
improve here
is the index swap phase (index_concurrent_swap:index.c) where the
relfilenode of the
old and new indexes are exchanged. Now an AccessExclusiveLock is taken
on the 2 relations
being swap, we could leverage that to ShareUpdateExclusiveLock with the
MVCC catalog
access I think.Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
not really concurrent, at least not concurrent to the standard set by
CREATE and DROP INDEX CONCURRENTLY.
Well, it still does the main body of work in a concurrent fashion, so I
still don't see how that argument holds that much water. But anyway, the
argument was only whether we could continue reviewing before the mvcc
stuff goes in, not whether it can get committed before.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 6/17/13 9:19 AM, Andres Freund wrote:
Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
not really concurrent, at least not concurrent to the standard set by
CREATE and DROP INDEX CONCURRENTLY.Well, it still does the main body of work in a concurrent fashion, so I
still don't see how that argument holds that much water.
The reason we added DROP INDEX CONCURRENTLY is so that you don't get
stuck in a lock situation like
long-running-transaction <- DROP INDEX <- everything else
If we accepted REINDEX CONCURRENTLY as currently proposed, then it would
have the same problem.
I don't think we should accept a REINDEX CONCURRENTLY implementation
that is worse in that respect than a manual CREATE INDEX CONCURRENTLY +
DROP INDEX CONCURRENTLY combination.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-17 11:03:35 -0400, Peter Eisentraut wrote:
On 6/17/13 9:19 AM, Andres Freund wrote:
Without getting rid of the AccessExclusiveLock, REINDEX CONCURRENTLY is
not really concurrent, at least not concurrent to the standard set by
CREATE and DROP INDEX CONCURRENTLY.Well, it still does the main body of work in a concurrent fashion, so I
still don't see how that argument holds that much water.The reason we added DROP INDEX CONCURRENTLY is so that you don't get
stuck in a lock situation likelong-running-transaction <- DROP INDEX <- everything else
If we accepted REINDEX CONCURRENTLY as currently proposed, then it would
have the same problem.I don't think we should accept a REINDEX CONCURRENTLY implementation
that is worse in that respect than a manual CREATE INDEX CONCURRENTLY +
DROP INDEX CONCURRENTLY combination.
Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
* reindex primary keys
* reindex keys referenced by foreign keys
* reindex exclusion constraints
* reindex toast tables
* do all that for a whole database
so I don't think that comparison is fair. Having it would have made
several previous point releases far less painful (e.g. 9.1.6/9.2.1).
But anyway, the as I said "the argument was only whether we could
continue reviewing before the mvcc stuff goes in, not whether it can get
committed before.".
I don't think we a have need to decide whether REINDEX CONCURRENTLY can
go in with the short exclusive lock unless we find unresolveable
problems with the mvcc patch. Which I very, very much hope not to be the
case.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 17, 2013 at 9:23 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Mon, Jun 17, 2013 at 5:23 AM, Andres Freund <andres@2ndquadrant.com>
wrote:On 2013-06-17 04:20:03 +0900, Fujii Masao wrote:
On Thu, Jun 6, 2013 at 1:29 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:Hi all,
Please find attached the latest versions of REINDEX CONCURRENTLY for
the 1st
commit fest of 9.4:
- 20130606_1_remove_reltoastidxid_v9.patch, removing reltoastidxid, to
allow
a toast relation to have multiple indexes running in parallel (extra
indexes
could be created by a REINDEX CONCURRENTLY processed)
- 20130606_2_reindex_concurrently_v26.patch, correcting some comments
and
fixed a lock in index_concurrent_create on an index relation not
released at
the end of a transactionCould you let me know how this patch has something to do with MVCC
catalog
access patch? Should we wait for MVCC catalog access patch to be
committed
before starting to review this patch?I wondered the same. The MVCC catalog patch, if applied, would make it
possible to make the actual relfilenode swap concurrently instead of
requiring to take access exlusive locks which obviously is way nicer. On
the other hand, that function is only a really small part of this patch,
so it seems quite possible to make another pass at it before relying on
mvcc catalog scans.As mentionned by Andres, the only thing that the MVCC catalog patch can
improve here
is the index swap phase (index_concurrent_swap:index.c) where the
relfilenode of the
old and new indexes are exchanged. Now an AccessExclusiveLock is taken on
the 2 relations
being swap, we could leverage that to ShareUpdateExclusiveLock with the MVCC
catalog
access I think.Also, with the MVCC catalog patch in, we could add some isolation tests for
REINDEX CONCURRENTLY (there were some tests in one of the previous
versions),
what is currently not possible due to the exclusive lock taken at swap
phase.Btw, those are minor things in the patch, so I think that it would be better
to not wait
for the MVCC catalog patch. Even if you think that it would be better to
wait for it,
you could even begin with the 1st patch allowing a toast relation to have
multiple
indexes (removal of reltoastidxid) which does not depend at all on it.
Here are the review comments of the removal_of_reltoastidxid patch.
I've not completed the review yet, but I'd like to post the current comments
before going to bed ;)
*** a/src/backend/catalog/system_views.sql
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ pg_stat_get_blocks_fetched(X.indrelid) -
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read,
+ pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hit
ISTM that X.indrelid indicates the TOAST table not the TOAST index.
Shouldn't we use X.indexrelid instead of X.indrelid?
You changed some SQLs because of removal of reltoastidxid.
Could you check that the original SQL and changed one return
the same value, again?
doc/src/sgml/diskusage.sgml
There will be one index on the
<acronym>TOAST</> table, if present.
I'm not sure if multiple indexes on TOAST table are viewable by a user.
If it's viewable, we need to correct the above description.
doc/src/sgml/monitoring.sgml
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
<entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
<entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
For the same reason as the above, we need to change "index" to "indexes"
in these descriptions?
*** a/src/bin/pg_dump/pg_dump.c
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid "
+ "LIMIT 1",
Is there the case where TOAST table has more than one *valid* indexes?
If yes, is it really okay to choose just one index by using LIMIT 1?
If no, i.e., TOAST table should have only one valid index, we should get rid
of LIMIT 1 and check that only one row is returned from this query.
Fortunately, ISTM this check has been already done by the subsequent
call of ExecuteSqlQueryForSingleRow(). Thought?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
* reindex primary keys
* reindex keys referenced by foreign keys
* reindex exclusion constraints
* reindex toast tables
* do all that for a whole database
so I don't think that comparison is fair. Having it would have made
several previous point releases far less painful (e.g. 9.1.6/9.2.1).
FWIW, I have a client who needs this implementation enough that we're
backporting it to 9.1 for them.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WM99499d86415ca73a472b857e7a20b7f5105a6f89c9744b5bd72a5a6a59a79d95c255d02a8b5bad018889a160aff84b7f@asav-3.01.com
On 2013-06-17 12:52:36 -0700, Josh Berkus wrote:
Well, it can do lots stuff that DROP/CREATE CONCURRENTLY can't:
* reindex primary keys
* reindex keys referenced by foreign keys
* reindex exclusion constraints
* reindex toast tables
* do all that for a whole database
so I don't think that comparison is fair. Having it would have made
several previous point releases far less painful (e.g. 9.1.6/9.2.1).FWIW, I have a client who needs this implementation enough that we're
backporting it to 9.1 for them.
Wait. What? Unless you break catalog compatibility that's not safely
possible using this implementation.
Greetings,
Andres Freund
PS: Josh, minor thing, but could you please not trim the CC list, at
least when I am on it?
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund wrote:
PS: Josh, minor thing, but could you please not trim the CC list, at
least when I am on it?
Yes, it's annoying.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 06/17/2013 01:40 PM, Alvaro Herrera wrote:
Andres Freund wrote:
PS: Josh, minor thing, but could you please not trim the CC list, at
least when I am on it?Yes, it's annoying.
I also get private comments from people who don't want me to cc them
when they are already on the list. I can't satisfy everyone.
--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Import Notes
Reply to msg id not found: WMf01cb9cabf8f5da04e898252a2f6f5300864002506a728de1852514fe2f1c412f7b1af94b8b1049be586d6f7a9894656@asav-1.01.com
On 2013-06-17 13:46:07 -0700, Josh Berkus wrote:
On 06/17/2013 01:40 PM, Alvaro Herrera wrote:
Andres Freund wrote:
PS: Josh, minor thing, but could you please not trim the CC list, at
least when I am on it?Yes, it's annoying.
I also get private comments from people who don't want me to cc them
when they are already on the list. I can't satisfy everyone.
Given that nobody but you trims the CC list I don't find that a
convincing argument.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
An updated patch for the toast part is attached.
On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Here are the review comments of the removal_of_reltoastidxid patch.
I've not completed the review yet, but I'd like to post the current comments
before going to bed ;)*** a/src/backend/catalog/system_views.sql - pg_stat_get_blocks_fetched(X.oid) - - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read, - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit + pg_stat_get_blocks_fetched(X.indrelid) - + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read, + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hitISTM that X.indrelid indicates the TOAST table not the TOAST index.
Shouldn't we use X.indexrelid instead of X.indrelid?
Indeed good catch! We need in this case the statistics on the index
and here I used the table OID. Btw, I also noticed that as multiple
indexes may be involved for a given toast relation, it makes sense to
actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
stats of the indexes.
You changed some SQLs because of removal of reltoastidxid.
Could you check that the original SQL and changed one return
the same value, again?
Sure, here are some results I am getting for pg_statio_all_tables with
a simple example to get stats on a table that has a toast relation.
With patch (after correcting to indexrelid and defining stats as a sum):
ioltas=# select relname, toast_blks_hit, tidx_blks_read from
pg_statio_all_tables where relname ='aa';
relname | toast_blks_hit | tidx_blks_read
---------+----------------+----------------
aa | 433313 | 829
(1 row)
With master:
relname | toast_blks_hit | tidx_blks_read
---------+----------------+----------------
aa | 433313 | 829
(1 row)
So the results are the same.
doc/src/sgml/diskusage.sgml
There will be one index on the
<acronym>TOAST</> table, if present.I'm not sure if multiple indexes on TOAST table are viewable by a user.
If it's viewable, we need to correct the above description.
AFAIK, toast indexes are not directly visible to the user.
ioltas=# \d aa
Table "public.aa"
Column | Type | Modifiers
--------+---------+-----------
a | integer |
b | text |
ioltas=# select l.relname from pg_class c join pg_class l on
(c.reltoastrelid = l.oid) where c.relname = 'aa';
relname
----------------
pg_toast_16386
(1 row)
However you can still query the schema pg_toast to get details about a
toast relation.
ioltas=# \d pg_toast.pg_toast_16386_index
Index "pg_toast.pg_toast_16386_index"
Column | Type | Definition
-----------+---------+------------
chunk_id | oid | chunk_id
chunk_seq | integer | chunk_seq
primary key, btree, for table "pg_toast.pg_toast_16386"
doc/src/sgml/monitoring.sgml
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
<entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
<entry>Number of buffer hits in this table's TOAST table index (if any)</entry>For the same reason as the above, we need to change "index" to "indexes"
in these descriptions?
Yes it makes sense. Changed it this way. After some more search with
grep, I haven't noticed any other places where it would be necessary
to correct the docs.
*** a/src/bin/pg_dump/pg_dump.c + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid " + "LIMIT 1",Is there the case where TOAST table has more than one *valid* indexes?
I just rechecked the patch and is answer is no. The concurrent index
is set as valid inside the same transaction as swap. So only the
backend performing the swap will be able to see two valid toast
indexes at the same time.
If yes, is it really okay to choose just one index by using LIMIT 1?
If no, i.e., TOAST table should have only one valid index, we should get rid
of LIMIT 1 and check that only one row is returned from this query.
Fortunately, ISTM this check has been already done by the subsequent
call of ExecuteSqlQueryForSingleRow(). Thought?
Hum, this is debatable, but for simplicity of pg_dump code, let's
remove it this LIMIT clause and rely on the assumption that a toast
relation can only have one valid index at a given moment.
--
Michael
Attachments:
20130617_1_remove_reltoastidxid_v10.patchapplication/octet-stream; name=20130617_1_remove_reltoastidxid_v10.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..3a6342c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indrelid IN (SELECT reltoastrelid "
+ " FROM pg_class "
+ " WHERE oid >= %u "
+ " AND reltoastrelid != %u)",
+ FirstNormalObjectId, InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e638a8f..f3d1d9e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..a2064eb 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,13 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
@@ -1237,8 +1239,8 @@ static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1259,29 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1346,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
@@ -1367,7 +1383,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,7 +1401,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1440,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,10 +1466,12 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relations
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1495,14 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1511,22 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1541,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1555,10 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1531,11 +1569,28 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,7 +1603,8 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1556,6 +1612,11 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
@@ -1573,7 +1634,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
@@ -1591,8 +1652,8 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1668,9 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1686,21 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1719,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1808,10 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1750,8 +1826,8 @@ toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1850,9 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1895,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1947,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2044,36 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+/* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+static Relation
+toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+{
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..e08954e 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..e196a0c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..3c2a474 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..af00c36 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1172,8 +1172,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1392,19 +1390,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
- * If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * If we're swapping two toast tables by content, do the same for all of
+ * their indexes. The swap can actually be safely done only if the
+ * relations have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- is_internal,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Relation toastRel1, toastRel2;
+
+ /* Open relations */
+ toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
+ toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
+
+ /* Obtain index list */
+ RelationGetIndexList(toastRel1);
+ RelationGetIndexList(toastRel2);
+
+ /* Check if the swap is possible for all the toast indexes */
+ if (list_length(toastRel1->rd_indexlist) == 1 &&
+ list_length(toastRel2->rd_indexlist) == 1)
+ {
+ ListCell *lc1, *lc2;
+
+ /* Now swap each couple */
+ lc2 = list_head(toastRel2->rd_indexlist);
+ foreach(lc1, toastRel1->rd_indexlist)
+ {
+ Oid indexOid1 = lfirst_oid(lc1);
+ Oid indexOid2 = lfirst_oid(lc2);
+ swap_relation_files(indexOid1,
+ indexOid2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ is_internal,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ lc2 = lnext(lc2);
+ }
+ }
+ else
+ {
+ /*
+ * As this code path is only taken by shared catalogs, who cannot
+ * have multiple indexes on their toast relation, simply return
+ * an error.
+ */
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation")));
+ }
+
+ heap_close(toastRel1, AccessExclusiveLock);
+ heap_close(toastRel2, AccessExclusiveLock);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
+ RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
@@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
- snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
- OIDOldHeap);
- RenameRelationInternal(toastidx,
- NewToastName, true);
+ /* ... and its indexes too */
+ foreach(lc, toastrel->rd_indexlist)
+ {
+ /*
+ * The first index keeps the former toast name and the
+ * following entries have a suffix appended.
+ */
+ if (count == 0)
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
+ OIDOldHeap);
+ else
+ snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
+ OIDOldHeap, count);
+ RenameRelationInternal(lfirst_oid(lc),
+ NewToastName, true);
+ count++;
+ }
}
relation_close(newrel, NoLock);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8294b29..2b777da 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8728,7 +8728,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8736,6 +8735,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8863,8 +8870,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid toastidxid = lfirst_oid(lc);
+ if (OidIsValid(toastidxid))
+ ATExecSetTableSpace(toastidxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..f721dbb 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..9ee6ea7 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,8 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +351,20 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, toastRel->rd_indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ec956ad..ac42389 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2816,7 +2816,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 8ac2549..31309ed 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -29,6 +29,16 @@ typedef struct RelationData *Relation;
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfValid
+ * Get index list of relation without recomputing it.
+ */
+#define RelationGetIndexListIfValid(rel) \
+do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+} while(0)
+
+/*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..6febc50 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (pg_stat_get_blocks_fetched(x.indrelid) - pg_stat_get_blocks_hit(x.indrelid)) AS tidx_blks_read, +
+ | pg_stat_get_blocks_hit(x.indrelid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
Hi,
On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c index c381f11..3a6342c 100644 --- a/contrib/pg_upgrade/info.c +++ b/contrib/pg_upgrade/info.c @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) "INSERT INTO info_rels " "SELECT reltoastrelid " "FROM info_rels i JOIN pg_catalog.pg_class c " - " ON i.reloid = c.oid")); + " ON i.reloid = c.oid " + " AND c.reltoastrelid != %u", InvalidOid)); PQclear(executeQueryOrDie(conn, "INSERT INTO info_rels " - "SELECT reltoastidxid " - "FROM info_rels i JOIN pg_catalog.pg_class c " - " ON i.reloid = c.oid")); + "SELECT indexrelid " + "FROM pg_index " + "WHERE indrelid IN (SELECT reltoastrelid " + " FROM pg_class " + " WHERE oid >= %u " + " AND reltoastrelid != %u)", + FirstNormalObjectId, InvalidOid));
What's the idea behind the >= here?
I think we should ignore the invalid indexes in that SELECT?
@@ -1392,19 +1390,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}/* - * If we're swapping two toast tables by content, do the same for their - * indexes. + * If we're swapping two toast tables by content, do the same for all of + * their indexes. The swap can actually be safely done only if the + * relations have indexes. */ if (swap_toast_by_content && - relform1->reltoastidxid && relform2->reltoastidxid) - swap_relation_files(relform1->reltoastidxid, - relform2->reltoastidxid, - target_is_pg_class, - swap_toast_by_content, - is_internal, - InvalidTransactionId, - InvalidMultiXactId, - mapped_tables); + relform1->reltoastrelid && + relform2->reltoastrelid) + { + Relation toastRel1, toastRel2; + + /* Open relations */ + toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock); + toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock); + + /* Obtain index list */ + RelationGetIndexList(toastRel1); + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toast indexes */ + if (list_length(toastRel1->rd_indexlist) == 1 && + list_length(toastRel2->rd_indexlist) == 1) + { + ListCell *lc1, *lc2; + + /* Now swap each couple */ + lc2 = list_head(toastRel2->rd_indexlist); + foreach(lc1, toastRel1->rd_indexlist) + { + Oid indexOid1 = lfirst_oid(lc1); + Oid indexOid2 = lfirst_oid(lc2); + swap_relation_files(indexOid1, + indexOid2, + target_is_pg_class, + swap_toast_by_content, + is_internal, + InvalidTransactionId, + InvalidMultiXactId, + mapped_tables); + lc2 = lnext(lc2); + }
Why are you iterating over the indexlists after checking they are both
of length == 1? Looks like the code would be noticeably shorter without
that.
+ } + else + { + /* + * As this code path is only taken by shared catalogs, who cannot + * have multiple indexes on their toast relation, simply return + * an error. + */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation"))); + } +
Absolutely minor thing, using an elog() seems to be better here since
that uses the appropriate error code for some codepath that's not
expected to be executed.
/* Clean up. */
heap_freetuple(reltup1);
@@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;toastrel = relation_open(newrel->rd_rel->reltoastrelid, AccessShareLock); - toastidx = toastrel->rd_rel->reltoastidxid; + RelationGetIndexList(toastrel); relation_close(toastrel, AccessShareLock);/* rename the toast table ... */
@@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);- /* ... and its index too */ - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", - OIDOldHeap); - RenameRelationInternal(toastidx, - NewToastName, true); + /* ... and its indexes too */ + foreach(lc, toastrel->rd_indexlist) + { + /* + * The first index keeps the former toast name and the + * following entries have a suffix appended. + */ + if (count == 0) + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", + OIDOldHeap); + else + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d", + OIDOldHeap, count); + RenameRelationInternal(lfirst_oid(lc), + NewToastName, true); + count++; + } } relation_close(newrel, NoLock); }
Is it actually possible to get here with multiple toast indexes?
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index ec956ad..ac42389 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout, Oid pg_class_reltoastidxid;appendPQExpBuffer(upgrade_query, - "SELECT c.reltoastrelid, t.reltoastidxid " + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;", pg_class_oid);
This possibly needs a version qualification due to querying
indisalid. How far back do we support pg_upgrade?
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 8ac2549..31309ed 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -29,6 +29,16 @@ typedef struct RelationData *Relation; typedef Relation *RelationPtr;/* + * RelationGetIndexListIfValid + * Get index list of relation without recomputing it. + */ +#define RelationGetIndexListIfValid(rel) \ +do { \ + if (rel->rd_indexvalid == 0) \ + RelationGetIndexList(rel); \ +} while(0)
Isn't this function misnamed and should be
RelationGetIndexListIfInValid?
Going to do some performance tests now.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
On 2013-06-18 11:35:10 +0200, Andres Freund wrote:
Going to do some performance tests now.
Ok, so ran the worst case load I could think of and didn't notice
any relevant performance changes.
The test I ran was:
CREATE TABLE test_toast(id serial primary key, data text);
ALTER TABLE test_toast ALTER COLUMN data SET STORAGE external;
INSERT INTO test_toast(data) SELECT repeat('a', 8000) FROM generate_series(1, 200000);
VACUUM FREEZE test_toast;
And then with that:
\setrandom id 1 200000
SELECT id, substring(data, 1, 10) FROM test_toast WHERE id = :id;
Which should really stress the potentially added overhead since we're
doing many toast accesses, but always only fetch one chunk.
One other thing: Your latest patch forgot to adjust rules.out, so make
check didn't pass...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
An updated patch for the toast part is attached.
On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Here are the review comments of the removal_of_reltoastidxid patch.
I've not completed the review yet, but I'd like to post the current comments
before going to bed ;)*** a/src/backend/catalog/system_views.sql - pg_stat_get_blocks_fetched(X.oid) - - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read, - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit + pg_stat_get_blocks_fetched(X.indrelid) - + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read, + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hitISTM that X.indrelid indicates the TOAST table not the TOAST index.
Shouldn't we use X.indexrelid instead of X.indrelid?Indeed good catch! We need in this case the statistics on the index
and here I used the table OID. Btw, I also noticed that as multiple
indexes may be involved for a given toast relation, it makes sense to
actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
stats of the indexes.
Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
Otherwise, you may get two rows of the same table from pg_statio_all_tables.
doc/src/sgml/diskusage.sgml
There will be one index on the
<acronym>TOAST</> table, if present.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
When I used gdb and tracked the code path of concurrent reindex patch,
I found it's possible that more than one *valid* toast indexes appear. Those
multiple valid toast indexes are viewable, for example, from pg_indexes.
I'm not sure whether this is the bug of concurrent reindex patch. But
if it's not,
you seem to need to change the above description again.
*** a/src/bin/pg_dump/pg_dump.c + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid " + "LIMIT 1",Is there the case where TOAST table has more than one *valid* indexes?
I just rechecked the patch and is answer is no. The concurrent index
is set as valid inside the same transaction as swap. So only the
backend performing the swap will be able to see two valid toast
indexes at the same time.
According to my quick gdb testing, this seems not to be true....
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jun 18, 2013 at 9:54 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
On 2013-06-18 11:35:10 +0200, Andres Freund wrote:
Going to do some performance tests now.
Ok, so ran the worst case load I could think of and didn't notice
any relevant performance changes.The test I ran was:
CREATE TABLE test_toast(id serial primary key, data text);
ALTER TABLE test_toast ALTER COLUMN data SET STORAGE external;
INSERT INTO test_toast(data) SELECT repeat('a', 8000) FROM generate_series(1, 200000);
VACUUM FREEZE test_toast;And then with that:
\setrandom id 1 200000
SELECT id, substring(data, 1, 10) FROM test_toast WHERE id = :id;Which should really stress the potentially added overhead since we're
doing many toast accesses, but always only fetch one chunk.
Sounds really good!
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 19, 2013 at 12:36 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:An updated patch for the toast part is attached.
On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Here are the review comments of the removal_of_reltoastidxid patch.
I've not completed the review yet, but I'd like to post the current comments
before going to bed ;)*** a/src/backend/catalog/system_views.sql - pg_stat_get_blocks_fetched(X.oid) - - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read, - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit + pg_stat_get_blocks_fetched(X.indrelid) - + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read, + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hitISTM that X.indrelid indicates the TOAST table not the TOAST index.
Shouldn't we use X.indexrelid instead of X.indrelid?Indeed good catch! We need in this case the statistics on the index
and here I used the table OID. Btw, I also noticed that as multiple
indexes may be involved for a given toast relation, it makes sense to
actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
stats of the indexes.Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
Otherwise, you may get two rows of the same table from pg_statio_all_tables.
I changed it a little bit in a different way in my latest patch by
adding a sum on all the indexes when getting tidx_blks stats.
doc/src/sgml/diskusage.sgml
There will be one index on the
<acronym>TOAST</> table, if present.+ table (see <xref linkend="storage-toast">). There will be one valid index + on the <acronym>TOAST</> table, if present. There also might be indexesWhen I used gdb and tracked the code path of concurrent reindex patch,
I found it's possible that more than one *valid* toast indexes appear. Those
multiple valid toast indexes are viewable, for example, from pg_indexes.
I'm not sure whether this is the bug of concurrent reindex patch. But
if it's not,
you seem to need to change the above description again.
Not sure about that. The latest code is made such as only one valid
index is present on the toast relation at the same time.
*** a/src/bin/pg_dump/pg_dump.c + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid " + "LIMIT 1",Is there the case where TOAST table has more than one *valid* indexes?
I just rechecked the patch and is answer is no. The concurrent index
is set as valid inside the same transaction as swap. So only the
backend performing the swap will be able to see two valid toast
indexes at the same time.According to my quick gdb testing, this seems not to be true....
Well, I have to disagree. I am not able to reproduce it. Which version
did you use? Here is what I get with the latest version of REINDEX
CONCURRENTLY patch... I checked with the following process:
1) Create this table:
CREATE TABLE aa (a int, b text);
ALTER TABLE aa ALTER COLUMN b SET STORAGE EXTERNAL;
2) Create session 1 and take a breakpoint on
ReindexRelationConcurrently:indexcmds.c
3) Launch REINDEX TABLE CONCURRENTLY aa
4) With a 2nd session, Go through all the phases of the process and
scanned the validity of toast indexes with the following
ioltas=# select pg_class.relname, indisvalid, indisready from
pg_class, pg_index where pg_class.reltoastrelid = pg_index.indrelid
and pg_class.relname = 'aa';
relname | indisvalid | indisready
---------+------------+------------
aa | t | t
aa | f | t
(2 rows)
When scanning all the phases with the 2nd psql session (the concurrent
index creation, build, validation, swap, and drop of the concurrent
index), I saw at no moment that indisvalid was set at true for the two
indexes at the same time. indisready was of course changed to prepare
the concurrent index to be ready for inserts, but that was all and
this is part of the process.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Please find an updated patch. The regression test rules has been
updated, and all the comments are addressed.
On Tue, Jun 18, 2013 at 6:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c index c381f11..3a6342c 100644 --- a/contrib/pg_upgrade/info.c +++ b/contrib/pg_upgrade/info.c @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) "INSERT INTO info_rels " "SELECT reltoastrelid " "FROM info_rels i JOIN pg_catalog.pg_class c " - " ON i.reloid = c.oid")); + " ON i.reloid = c.oid " + " AND c.reltoastrelid != %u", InvalidOid)); PQclear(executeQueryOrDie(conn, "INSERT INTO info_rels " - "SELECT reltoastidxid " - "FROM info_rels i JOIN pg_catalog.pg_class c " - " ON i.reloid = c.oid")); + "SELECT indexrelid " + "FROM pg_index " + "WHERE indrelid IN (SELECT reltoastrelid " + " FROM pg_class " + " WHERE oid >= %u " + " AND reltoastrelid != %u)", + FirstNormalObjectId, InvalidOid));What's the idea behind the >= here?
It is here to avoid fetching the toast relations of system tables. But
I see your point, the inner query fetching the toast OIDs should do a
join on the exising info_rels and not try to do a join on a plain
pg_index, so changed this way.
I think we should ignore the invalid indexes in that SELECT?
Yes indeed, it doesn't make sense to grab invalid toast indexes.
Changed this way.
@@ -1392,19 +1390,62 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}/* - * If we're swapping two toast tables by content, do the same for their - * indexes. + * If we're swapping two toast tables by content, do the same for all of + * their indexes. The swap can actually be safely done only if the + * relations have indexes. */ if (swap_toast_by_content && - relform1->reltoastidxid && relform2->reltoastidxid) - swap_relation_files(relform1->reltoastidxid, - relform2->reltoastidxid, - target_is_pg_class, - swap_toast_by_content, - is_internal, - InvalidTransactionId, - InvalidMultiXactId, - mapped_tables); + relform1->reltoastrelid && + relform2->reltoastrelid) + { + Relation toastRel1, toastRel2; + + /* Open relations */ + toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock); + toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock); + + /* Obtain index list */ + RelationGetIndexList(toastRel1); + RelationGetIndexList(toastRel2); + + /* Check if the swap is possible for all the toast indexes */ + if (list_length(toastRel1->rd_indexlist) == 1 && + list_length(toastRel2->rd_indexlist) == 1) + { + ListCell *lc1, *lc2; + + /* Now swap each couple */ + lc2 = list_head(toastRel2->rd_indexlist); + foreach(lc1, toastRel1->rd_indexlist) + { + Oid indexOid1 = lfirst_oid(lc1); + Oid indexOid2 = lfirst_oid(lc2); + swap_relation_files(indexOid1, + indexOid2, + target_is_pg_class, + swap_toast_by_content, + is_internal, + InvalidTransactionId, + InvalidMultiXactId, + mapped_tables); + lc2 = lnext(lc2); + }Why are you iterating over the indexlists after checking they are both
of length == 1? Looks like the code would be noticeably shorter without
that.
OK. Modified this way.
+ } + else + { + /* + * As this code path is only taken by shared catalogs, who cannot + * have multiple indexes on their toast relation, simply return + * an error. + */ + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot swap relation files of a shared catalog with multiple indexes on toast relation"))); + } +Absolutely minor thing, using an elog() seems to be better here since
that uses the appropriate error code for some codepath that's not
expected to be executed.
OK. Modified this way.
/* Clean up. */ heap_freetuple(reltup1); @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap, if (OidIsValid(newrel->rd_rel->reltoastrelid)) { Relation toastrel; - Oid toastidx; char NewToastName[NAMEDATALEN]; + ListCell *lc; + int count = 0;toastrel = relation_open(newrel->rd_rel->reltoastrelid, AccessShareLock); - toastidx = toastrel->rd_rel->reltoastidxid; + RelationGetIndexList(toastrel); relation_close(toastrel, AccessShareLock);/* rename the toast table ... */
@@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);- /* ... and its index too */ - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", - OIDOldHeap); - RenameRelationInternal(toastidx, - NewToastName, true); + /* ... and its indexes too */ + foreach(lc, toastrel->rd_indexlist) + { + /* + * The first index keeps the former toast name and the + * following entries have a suffix appended. + */ + if (count == 0) + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", + OIDOldHeap); + else + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d", + OIDOldHeap, count); + RenameRelationInternal(lfirst_oid(lc), + NewToastName, true); + count++; + } } relation_close(newrel, NoLock); }Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index ec956ad..ac42389 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout, Oid pg_class_reltoastidxid;appendPQExpBuffer(upgrade_query, - "SELECT c.reltoastrelid, t.reltoastidxid " + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;", pg_class_oid);This possibly needs a version qualification due to querying
indisvalid. How far back do we support pg_upgrade?
By having a look at the docs, pg_upgrade has been added in 9.0 and
support upgrades for version >= 8.3.X. indisvalid has been added in
8.2 so we are fine.
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 8ac2549..31309ed 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -29,6 +29,16 @@ typedef struct RelationData *Relation; typedef Relation *RelationPtr;/* + * RelationGetIndexListIfValid + * Get index list of relation without recomputing it. + */ +#define RelationGetIndexListIfValid(rel) \ +do { \ + if (rel->rd_indexvalid == 0) \ + RelationGetIndexList(rel); \ +} while(0)Isn't this function misnamed and should be
RelationGetIndexListIfInValid?
When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.
--
Michael
Attachments:
20130618_2_reindex_concurrently_v27.patchapplication/octet-stream; name=20130618_2_reindex_concurrently_v27.patchDownload
*** a/doc/src/sgml/mvcc.sgml
--- b/doc/src/sgml/mvcc.sgml
***************
*** 863,870 **** ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
! <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
! some forms of <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
--- 863,871 ----
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
! <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
! <command>REINDEX CONCURRENTLY</> and some forms of
! <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
*** a/doc/src/sgml/ref/reindex.sgml
--- b/doc/src/sgml/ref/reindex.sgml
***************
*** 21,27 **** PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
! REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
--- 21,27 ----
<refsynopsisdiv>
<synopsis>
! REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
***************
*** 68,76 **** REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
! <command>REINDEX</> will not perform a concurrent build. To build the
! index without interfering with production you should drop the index and
! reissue the <command>CREATE INDEX CONCURRENTLY</> command.
</para>
</listitem>
--- 68,88 ----
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
! <command>REINDEX</> will perform a concurrent build if <literal>
! CONCURRENTLY</> is specified. To build the index without interfering
! with production you should drop the index and reissue either the
! <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
! command. Indexes of toast relations can be rebuilt with <command>REINDEX
! CONCURRENTLY</>.
! </para>
! </listitem>
!
! <listitem>
! <para>
! Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
! EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
! DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
! using constraints. Other indexes can be dropped using <literal>DROP INDEX</>.
</para>
</listitem>
***************
*** 139,144 **** REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
--- 151,171 ----
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
***************
*** 231,236 **** REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
--- 258,376 ----
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped in. During this phase the concurrent
+ index is marked as valid, is then swapped and marked as invalid. An exclusive
+ lock is taken at this phase. Finally two additional transactions are used to
+ mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+ <programlisting>
+ postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+ --------+---------+-----------
+ col | integer |
+ Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+ </programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Live indexes of toast relations
+ cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>
+ except during relation swap where <literal>ACCESS EXCLUSIVE</literal> lock
+ is taken.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
***************
*** 262,268 **** $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
! </programlisting></para>
</refsect1>
<refsect1>
--- 402,419 ----
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
! </programlisting>
! </para>
!
! <para>
! Rebuild a table while authorizing read and write operations on involved
! relations when performed:
!
! <programlisting>
! REINDEX TABLE CONCURRENTLY my_broken_table;
! </programlisting>
! </para>
!
</refsect1>
<refsect1>
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 43,51 ****
--- 43,53 ----
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+ #include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+ #include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
***************
*** 672,677 **** UpdateIndexRelation(Oid indexoid,
--- 674,683 ----
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
***************
*** 695,701 **** index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
--- 701,708 ----
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal,
! bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
***************
*** 738,756 **** index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
! * release locks before committing in catalogs
*/
if (concurrent &&
! IsSystemRelation(heapRelation))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
! * This case is currently not supported, but there's no way to ask for it
! * in the grammar anyway, so it can't happen.
*/
! if (concurrent && is_exclusion)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
--- 745,766 ----
/*
* concurrent index build on a system catalog is unsafe because we tend to
! * release locks before committing in catalogs. If the index is created during
! * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
! IsSystemRelation(heapRelation) &&
! !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
! * This case is currently only supported during a concurrent index
! * rebuild, but there is no way to ask for it in the grammar otherwise
! * anyway.
*/
! if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
***************
*** 1090,1095 **** index_create(Relation heapRelation,
--- 1100,1537 ----
return indexRelationId;
}
+
+ /*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+ Oid
+ index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+ {
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with existing names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, NoLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+ }
+
+
+ /*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+ void
+ index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+ {
+ Relation rel,
+ indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+ }
+
+
+ /*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. AccessExclusiveLock is taken on the index relations that are
+ * swapped until the end of the transaction where this function is called.
+ * Note: a lower lock could be taken if catalog cache with SnapshotNow was
+ * correctly MVCC'd.
+ */
+ void
+ index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+ {
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take an exclusive lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, AccessExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, AccessExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swapping happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+ }
+
+ /*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+ void
+ index_concurrent_set_dead(Oid indexId, Oid heapId, LOCKTAG locktag)
+ {
+ Relation heapRelation;
+ Relation indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexId, INDEX_DROP_SET_DEAD, true);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+ }
+
+ /*
+ * index_concurrent_clear_valid
+ *
+ * Release the valid state of a given index and then release the cache of
+ * its parent relation. This function should be called when initializing an
+ * index drop in a concurrent context before setting the index as dead if
+ * if called in a concurrent context.
+ */
+ void
+ index_concurrent_clear_valid(Relation heapRelation,
+ Oid indexOid,
+ bool concurrent)
+ {
+ /*
+ * Mark index invalid by updating its pg_index entry
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_CLEAR_VALID, concurrent);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(heapRelation);
+ }
+
+ /*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+ void
+ index_concurrent_drop(Oid indexOid)
+ {
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+ }
+
+
/*
* index_constraint_create
*
***************
*** 1325,1331 **** index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
--- 1767,1772 ----
***************
*** 1407,1423 **** index_drop(Oid indexId, bool concurrent)
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
! /*
! * Mark index invalid by updating its pg_index entry
! */
! index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
!
! /*
! * Invalidate the relcache for the table, so that after this commit
! * all sessions will refresh any cached plans that might reference the
! * index.
! */
! CacheInvalidateRelcache(userHeapRelation);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
--- 1848,1855 ----
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
! /* Mark the index as invalid */
! index_concurrent_clear_valid(userHeapRelation, indexId, true);
/* save lockrelid and locktag for below, then close but keep locks */
heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
***************
*** 1445,1507 **** index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
! /*
! * Now we must wait until no running transaction could be using the
! * index for a query. To do this, inquire which xacts currently would
! * conflict with AccessExclusiveLock on the table -- ie, which ones
! * have a lock of any kind on the table. Then wait for each of these
! * xacts to commit or abort. Note we do not need to worry about xacts
! * that open the table for reading after this point; they will see the
! * index as invalid when they open the relation.
! *
! * Note: the reason we use actual lock acquisition here, rather than
! * just checking the ProcArray and sleeping, is that deadlock is
! * possible if one of the transactions in question is blocked trying
! * to acquire an exclusive lock on our table. The lock code will
! * detect deadlock and error out properly.
! *
! * Note: GetLockConflicts() never reports our own xid, hence we need
! * not check for that. Also, prepared xacts are not reported, which
! * is fine since they certainly aren't going to do anything more.
! */
! old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
!
! /*
! * No more predicate locks will be acquired on this index, and we're
! * about to stop doing inserts into the index which could show
! * conflicts with existing predicate locks, so now is the time to move
! * them to the heap relation.
! */
! userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
! userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
! TransferPredicateLocksToHeapRelation(userIndexRelation);
!
! /*
! * Now we are sure that nobody uses the index for queries; they just
! * might have it open for updating it. So now we can unset indisready
! * and indislive, then wait till nobody could be using it at all
! * anymore.
! */
! index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
!
! /*
! * Invalidate the relcache for the table, so that after this commit
! * all sessions will refresh the table's index list. Forgetting just
! * the index's relcache entry is not enough.
! */
! CacheInvalidateRelcache(userHeapRelation);
!
! /*
! * Close the relations again, though still holding session lock.
! */
! heap_close(userHeapRelation, NoLock);
! index_close(userIndexRelation, NoLock);
/*
* Again, commit the transaction to make the pg_index update visible
--- 1877,1884 ----
CommitTransactionCommand();
StartTransactionCommand();
! /* Finish invalidation of index and mark it as dead */
! index_concurrent_set_dead(indexId, heapId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
***************
*** 1514,1526 **** index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
! old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
/*
* Re-open relations to allow us to complete our actions.
--- 1891,1897 ----
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
! WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
***************
*** 2991,3017 **** validate_index_heapscan(Relation heapRelation,
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
! * flags that denote the index's state. We must use an in-place update of
! * the pg_index tuple, because we do not have exclusive lock on the parent
! * table and so other sessions might concurrently be doing SnapshotNow scans
! * of pg_index to identify the table's indexes. A transactional update would
! * risk somebody not seeing the index at all. Because the update is not
! * transactional and will not roll back on error, this must only be used as
! * the last step in a transaction that has not made any transactional catalog
! * updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
! index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
! /* Assert that current xact hasn't done any transactional updates */
! Assert(GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
--- 3362,3393 ----
* index_set_state_flags - adjust pg_index state flags
*
* This is used during CREATE/DROP INDEX CONCURRENTLY to adjust the pg_index
! * flags that denote the index's state. If this function is called in a
! * concurrent process, we use an in-place update of the pg_index tuple,
! * because we do not have exclusive lock on the parent table and so other
! * sessions might concurrently be doing SnapshotNow scans of pg_index to
! * identify the table's indexes. A transactional update would risk somebody
! * not seeing the index at all. Because the update is not transactional
! * and will not roll back on error, this must only be used as the last step
! * in a transaction that has not made any transactional catalog updates!
*
* Note that heap_inplace_update does send a cache inval message for the
* tuple, so other sessions will hear about the update as soon as we commit.
*/
void
! index_set_state_flags(Oid indexId,
! IndexStateFlagsAction action,
! bool concurrent)
{
Relation pg_index;
HeapTuple indexTuple;
Form_pg_index indexForm;
! /*
! * Assert that current xact hasn't done any transactional updates, there
! * is nothing to worry in a non-concurrent context.
! */
! Assert(!concurrent || GetTopTransactionIdIfAny() == InvalidTransactionId);
/* Open pg_index and fetch a writable copy of the index's tuple */
pg_index = heap_open(IndexRelationId, RowExclusiveLock);
***************
*** 3071,3078 **** index_set_state_flags(Oid indexId, IndexStateFlagsAction action)
break;
}
! /* ... and write it back in-place */
! heap_inplace_update(pg_index, indexTuple);
heap_close(pg_index, RowExclusiveLock);
}
--- 3447,3466 ----
break;
}
! /*
! * Write it back in-place in a concurrent context, and do a simple update
! * for a non-concurrent context.
! */
! if (concurrent)
! {
! heap_inplace_update(pg_index, indexTuple);
! }
! else
! {
! simple_heap_update(pg_index, &indexTuple->t_self, indexTuple);
! CommandCounterIncrement();
! CatalogUpdateIndexes(pg_index, indexTuple);
! }
heap_close(pg_index, RowExclusiveLock);
}
*** a/src/backend/catalog/toasting.c
--- b/src/backend/catalog/toasting.c
***************
*** 281,287 **** create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
! true, false, false, true);
heap_close(toast_rel, NoLock);
--- 281,287 ----
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
! true, false, false, false, false);
heap_close(toast_rel, NoLock);
*** a/src/backend/commands/indexcmds.c
--- b/src/backend/commands/indexcmds.c
***************
*** 68,75 **** static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
! List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
--- 68,76 ----
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
! List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint,
! bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
***************
*** 311,317 **** DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
--- 312,317 ----
***************
*** 321,333 **** DefineIndex(IndexStmt *stmt,
IndexInfo *indexInfo;
int numberOfAttributes;
TransactionId limitXmin;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
--- 321,329 ----
***************
*** 454,460 **** DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
! stmt->isconstraint);
/*
* look up the access method, verify it can handle the requested features
--- 450,457 ----
indexColNames,
stmt->excludeOpNames,
stmt->primary,
! stmt->isconstraint,
! false);
/*
* look up the access method, verify it can handle the requested features
***************
*** 601,607 **** DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
! stmt->concurrent, !check_rights);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
--- 598,604 ----
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
! stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
***************
*** 664,681 **** DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
! old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
/*
* At this moment we are sure that there are no transactions with the
--- 661,668 ----
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
*/
! WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
***************
*** 695,728 **** DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
! /* We have to re-build the IndexInfo struct, since it was lost in commit */
! indexInfo = BuildIndexInfo(indexRelation);
! Assert(!indexInfo->ii_ReadyForInserts);
! indexInfo->ii_Concurrent = true;
! indexInfo->ii_BrokenHotChain = false;
!
! /* Now build the index */
! index_build(rel, indexRelation, indexInfo, stmt->primary, false);
!
! /* Close both the relations, but keep the locks */
! heap_close(rel, NoLock);
! index_close(indexRelation, NoLock);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY);
/* we can do away with our snapshot */
PopActiveSnapshot();
--- 682,701 ----
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
! /* Perform concurrent build of index */
! index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
! indexRelationId,
! stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
* commit this transaction, any new transactions that open the table must
* insert new entries into the index for insertions and non-HOT updates.
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_READY, true);
/* we can do away with our snapshot */
PopActiveSnapshot();
***************
*** 739,751 **** DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
! old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
!
! while (VirtualTransactionIdIsValid(*old_lockholders))
! {
! VirtualXactLock(*old_lockholders, true);
! old_lockholders++;
! }
/*
* Now take the "reference snapshot" that will be used by validate_index()
--- 712,718 ----
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
! WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
***************
*** 786,864 **** DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
! * transactions that might have older snapshots. Obtain a list of VXIDs
! * of such transactions, and wait for them individually.
! *
! * We can exclude any running transactions that have xmin > the xmin of
! * our reference snapshot; their oldest snapshot must be newer than ours.
! * We can also exclude any transactions that have xmin = zero, since they
! * evidently have no live snapshot at all (and any one they might be in
! * process of taking is certainly newer than ours). Transactions in other
! * DBs can be ignored too, since they'll never even be able to see this
! * index.
! *
! * We can also exclude autovacuum processes and processes running manual
! * lazy VACUUMs, because they won't be fazed by missing index entries
! * either. (Manual ANALYZEs, however, can't be excluded because they
! * might be within transactions that are going to do arbitrary operations
! * later.)
! *
! * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
! * check for that.
! *
! * If a process goes idle-in-transaction with xmin zero, we do not need to
! * wait for it anymore, per the above argument. We do not have the
! * infrastructure right now to stop waiting if that happens, but we can at
! * least avoid the folly of waiting when it is idle at the time we would
! * begin to wait. We do this by repeatedly rechecking the output of
! * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
! * doesn't show up in the output, we know we can forget about it.
*/
! old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
! PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
! &n_old_snapshots);
!
! for (i = 0; i < n_old_snapshots; i++)
! {
! if (!VirtualTransactionIdIsValid(old_snapshots[i]))
! continue; /* found uninteresting in previous cycle */
!
! if (i > 0)
! {
! /* see if anything's changed ... */
! VirtualTransactionId *newer_snapshots;
! int n_newer_snapshots;
! int j;
! int k;
!
! newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
! true, false,
! PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
! &n_newer_snapshots);
! for (j = i; j < n_old_snapshots; j++)
! {
! if (!VirtualTransactionIdIsValid(old_snapshots[j]))
! continue; /* found uninteresting in previous cycle */
! for (k = 0; k < n_newer_snapshots; k++)
! {
! if (VirtualTransactionIdEquals(old_snapshots[j],
! newer_snapshots[k]))
! break;
! }
! if (k >= n_newer_snapshots) /* not there anymore */
! SetInvalidVirtualTransactionId(old_snapshots[j]);
! }
! pfree(newer_snapshots);
! }
!
! if (VirtualTransactionIdIsValid(old_snapshots[i]))
! VirtualXactLock(old_snapshots[i], true);
! }
/*
* Index can now be marked valid -- update its pg_index entry
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID);
/*
* The pg_index update will cause backends (including this one) to update
--- 753,766 ----
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
! * transactions that might have older snapshots.
*/
! WaitForOldSnapshots(limitXmin);
/*
* Index can now be marked valid -- update its pg_index entry
*/
! index_set_state_flags(indexRelationId, INDEX_CREATE_SET_VALID, true);
/*
* The pg_index update will cause backends (including this one) to update
***************
*** 880,885 **** DefineIndex(IndexStmt *stmt,
--- 782,1331 ----
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+ bool
+ ReindexRelationConcurrently(Oid relationOid)
+ {
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+ TransactionId limitXmin;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY, true);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * We can now do away with our active snapshot, we still need to save the xmin
+ * limit to wait for older snapshots.
+ */
+ limitXmin = snapshot->xmin;
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(limitXmin);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes are valid and can be used, we need to
+ * swap each concurrent index with its corresponding old index. The
+ * concurrent index is marked as valid before performing the swap, and
+ * is invalidated once the swap is done, making it not usable by other
+ * backends once its associated transaction is committed.
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Relation indexRel, indexParentRel;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /*
+ * Mark the cache of associated relation as invalid, open relation
+ * relations. AccessExclusive Lock is taken here and not a lower lock
+ * to reduce likelihood of deadlock as ShareUpdateExclusiveLock is
+ * already taken within session.
+ */
+ indexRel = index_open(indOid, AccessExclusiveLock);
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ AccessExclusiveLock);
+
+ /*
+ * Concurrent index can now be marked as valid before performing
+ * the swap. Note here that as an exclusive lock is taken on the
+ * relations involved it is safer to call this function as it would
+ * be for a non-concurrent context.
+ * Note: With MVCC catalog access, a lower lock would be enough.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_VALID, false);
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Now mark the old index as invalid, the swap is done.
+ */
+ index_concurrent_clear_valid(indexParentRel, concurrentOid, false);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcache(indexParentRel);
+
+ /* Close relations opened previously for cache invalidation */
+ index_close(indexRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(indOid, relOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+ }
+
+
+ /*
* CheckMutability
* Test whether given expression is mutable
*/
***************
*** 1542,1548 **** ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint)
{
char *indexname;
--- 1988,1995 ----
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
! bool primary, bool isconstraint,
! bool concurrent)
{
char *indexname;
***************
*** 1568,1573 **** ChooseIndexName(const char *tabname, Oid namespaceId,
--- 2015,2027 ----
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
***************
*** 1680,1697 **** ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
! ReindexIndex(RangeVar *indexRelation)
{
Oid indOid;
Oid heapOid = InvalidOid;
! /* lock level used here should match index lock reindex_index() */
! indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
! false, false,
! RangeVarCallbackForReindexIndex,
! (void *) &heapOid);
! reindex_index(indOid, false);
return indOid;
}
--- 2134,2155 ----
* Recreate a specific index.
*/
Oid
! ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
! indOid = RangeVarGetRelidExtended(indexRelation,
! concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
! false, false,
! RangeVarCallbackForReindexIndex,
! (void *) &heapOid);
! /* Continue process for concurrent or non-concurrent case */
! if (!concurrent)
! reindex_index(indOid, false);
! else
! ReindexRelationConcurrently(indOid);
return indOid;
}
***************
*** 1760,1772 **** RangeVarCallbackForReindexIndex(const RangeVar *relation,
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
! ReindexTable(RangeVar *relation)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
! heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
! RangeVarCallbackOwnsTable, NULL);
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
--- 2218,2244 ----
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
! ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
! heapOid = RangeVarGetRelidExtended(relation,
! concurrent ? ShareUpdateExclusiveLock : ShareLock,
! false, false,
! RangeVarCallbackOwnsTable, NULL);
!
! /* Run through the concurrent process if necessary */
! if (concurrent)
! {
! if (!ReindexRelationConcurrently(heapOid))
! {
! ereport(NOTICE,
! (errmsg("table \"%s\" has no indexes",
! relation->relname)));
! }
! return heapOid;
! }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
***************
*** 1785,1791 **** ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
! ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
{
Relation relationRelation;
HeapScanDesc scan;
--- 2257,2266 ----
* That means this must not be called within a user transaction block!
*/
Oid
! ReindexDatabase(const char *databaseName,
! bool do_system,
! bool do_user,
! bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
***************
*** 1797,1802 **** ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
--- 2272,2286 ----
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
***************
*** 1880,1894 **** ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
! if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
! (errmsg("table \"%s.%s\" was reindexed",
get_namespace_name(get_rel_namespace(relid)),
! get_rel_name(relid))));
PopActiveSnapshot();
CommitTransactionCommand();
}
--- 2364,2403 ----
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
!
! /* Determine if relation needs to be processed concurrently */
! process_concurrent = concurrent &&
! !IsSystemNamespace(get_rel_namespace(relid));
!
! /*
! * Reindex relation with a concurrent or non-concurrent process.
! * System relations cannot be reindexed concurrently, but they
! * need to be reindexed including pg_class with a normal process
! * as they could be corrupted, and concurrent process might also
! * use them. This does not include toast relations, which are
! * reindexed when their parent relation is processed.
! */
! if (process_concurrent)
! {
! old = MemoryContextSwitchTo(private_context);
! result = ReindexRelationConcurrently(relid);
! MemoryContextSwitchTo(old);
! }
! else
! result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
!
! if (result)
ereport(NOTICE,
! (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
! get_rel_name(relid),
! process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 900,905 **** RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
--- 900,937 ----
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
*** a/src/backend/executor/execUtils.c
--- b/src/backend/executor/execUtils.c
***************
*** 1201,1206 **** check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
--- 1201,1220 ----
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***************
*** 3617,3622 **** _copyReindexStmt(const ReindexStmt *from)
--- 3617,3623 ----
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***************
*** 1839,1844 **** _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
--- 1839,1845 ----
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
***************
*** 6752,6780 **** opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
! REINDEX reindex_type qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
! n->relation = $3;
n->name = NULL;
$$ = (Node *)n;
}
! | REINDEX SYSTEM_P name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->name = $3;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
! | REINDEX DATABASE name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->name = $3;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
--- 6752,6783 ----
*****************************************************************************/
ReindexStmt:
! REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
! n->concurrent = $3;
! n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
! | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->concurrent = $3;
! n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
! | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
! n->concurrent = $3;
! n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
***************
*** 2526,2531 **** XidCacheRemoveRunningXids(TransactionId xid,
--- 2526,2677 ----
LWLockRelease(ProcArrayLock);
}
+
+ /*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKTAG -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+ void
+ WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+ {
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ pfree(old_lockholders);
+ }
+
+
+ /*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock.
+ */
+ void
+ WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+ {
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+ }
+
+
+ /*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given xmin
+ * limit, because it might not contain tuples deleted just before it has
+ * been taken. Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin given;
+ * their oldest snapshot must be newer than our xmin limit.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+ void
+ WaitForOldSnapshots(TransactionId limitXmin)
+ {
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+ }
+
+
#ifdef XIDCACHE_DEBUG
/*
*** a/src/backend/tcop/utility.c
--- b/src/backend/tcop/utility.c
***************
*** 778,793 **** standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
! ReindexIndex(stmt->relation);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
! ReindexTable(stmt->relation);
break;
case OBJECT_DATABASE:
--- 778,797 ----
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
! ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
! ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
***************
*** 799,806 **** standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
! ReindexDatabase(stmt->name,
! stmt->do_system, stmt->do_user);
break;
default:
elog(ERROR, "unrecognized object type: %d",
--- 803,810 ----
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
! ReindexDatabase(stmt->name, stmt->do_system,
! stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
*** a/src/include/catalog/index.h
--- b/src/include/catalog/index.h
***************
*** 60,66 **** extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
--- 60,87 ----
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
! bool is_internal,
! bool is_reindex);
!
! extern Oid index_concurrent_create(Relation heapRelation,
! Oid indOid,
! char *concurrentName);
!
! extern void index_concurrent_build(Oid heapOid,
! Oid indexOid,
! bool isprimary);
!
! extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
!
! extern void index_concurrent_set_dead(Oid indexId,
! Oid heapId,
! LOCKTAG locktag);
!
! extern void index_concurrent_clear_valid(Relation heapRelation,
! Oid indexOid,
! bool concurrent);
!
! extern void index_concurrent_drop(Oid indexOid);
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
***************
*** 100,106 **** extern double IndexBuildHeapScan(Relation heapRelation,
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
! extern void index_set_state_flags(Oid indexId, IndexStateFlagsAction action);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
--- 121,129 ----
extern void validate_index(Oid heapId, Oid indexId, Snapshot snapshot);
! extern void index_set_state_flags(Oid indexId,
! IndexStateFlagsAction action,
! bool concurrent);
extern void reindex_index(Oid indexId, bool skip_constraint_checks);
*** a/src/include/commands/defrem.h
--- b/src/include/commands/defrem.h
***************
*** 26,35 **** extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
! extern Oid ReindexIndex(RangeVar *indexRelation);
! extern Oid ReindexTable(RangeVar *relation);
extern Oid ReindexDatabase(const char *databaseName,
! bool do_system, bool do_user);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
--- 26,36 ----
bool check_rights,
bool skip_build,
bool quiet);
! extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
! extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
! bool do_system, bool do_user, bool concurrent);
! extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
*** a/src/include/nodes/parsenodes.h
--- b/src/include/nodes/parsenodes.h
***************
*** 2538,2543 **** typedef struct ReindexStmt
--- 2538,2544 ----
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
*** a/src/include/storage/procarray.h
--- b/src/include/storage/procarray.h
***************
*** 76,79 **** extern void XidCacheRemoveRunningXids(TransactionId xid,
--- 76,83 ----
int nxids, const TransactionId *xids,
TransactionId latestXid);
+ extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+ extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+ extern void WaitForOldSnapshots(TransactionId limitXmin);
+
#endif /* PROCARRAY_H */
*** a/src/test/regress/expected/create_index.out
--- b/src/test/regress/expected/create_index.out
***************
*** 2721,2723 **** ORDER BY thousand;
--- 2721,2778 ----
1 | 1001
(2 rows)
+ --
+ -- Check behavior of REINDEX and REINDEX CONCURRENTLY
+ --
+ CREATE TABLE concur_reindex_tab (c1 int);
+ -- REINDEX
+ REINDEX TABLE concur_reindex_tab; -- notice
+ NOTICE: table "concur_reindex_tab" has no indexes
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ NOTICE: table "concur_reindex_tab" has no indexes
+ ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+ -- Normal index with integer column
+ CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+ -- Normal index with text column
+ CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+ -- UNIQUE index with expression
+ CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+ -- Duplicate column names
+ CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+ -- Create table for check on foreign key dependence switch with indexes swapped
+ ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+ CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+ INSERT INTO concur_reindex_tab VALUES (1, 'a');
+ INSERT INTO concur_reindex_tab VALUES (2, 'a');
+ -- Check materialized views
+ CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+ REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+ -- Check errors
+ -- Cannot run inside a transaction block
+ BEGIN;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+ COMMIT;
+ REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ ERROR: concurrent reindex is not supported for shared relations
+ REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ ERROR: cannot reindex system concurrently
+ -- Check the relation status, there should not be invalid indexes
+ \d concur_reindex_tab
+ Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+ --------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+ Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+ Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+ DROP MATERIALIZED VIEW concur_reindex_matview;
+ DROP TABLE concur_reindex_tab, concur_reindex_tab2;
*** a/src/test/regress/sql/create_index.sql
--- b/src/test/regress/sql/create_index.sql
***************
*** 912,914 **** ORDER BY thousand;
--- 912,954 ----
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+ --
+ -- Check behavior of REINDEX and REINDEX CONCURRENTLY
+ --
+ CREATE TABLE concur_reindex_tab (c1 int);
+ -- REINDEX
+ REINDEX TABLE concur_reindex_tab; -- notice
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+ -- Normal index with integer column
+ CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+ -- Normal index with text column
+ CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+ -- UNIQUE index with expression
+ CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+ -- Duplicate column names
+ CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+ -- Create table for check on foreign key dependence switch with indexes swapped
+ ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+ CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+ INSERT INTO concur_reindex_tab VALUES (1, 'a');
+ INSERT INTO concur_reindex_tab VALUES (2, 'a');
+ -- Check materialized views
+ CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+ REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+ -- Check errors
+ -- Cannot run inside a transaction block
+ BEGIN;
+ REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ COMMIT;
+ REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+ -- Check the relation status, there should not be invalid indexes
+ \d concur_reindex_tab
+ DROP MATERIALIZED VIEW concur_reindex_matview;
+ DROP TABLE concur_reindex_tab, concur_reindex_tab2;
20130618_1_remove_reltoastidxid_v11.patchapplication/octet-stream; name=20130618_1_remove_reltoastidxid_v11.patchDownload
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
***************
*** 321,332 **** get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT reltoastidxid "
! "FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
--- 321,339 ----
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT indexrelid "
! "FROM pg_index "
! "WHERE indisvalid "
! " AND indrelid IN (SELECT reltoastrelid "
! " FROM info_rels i "
! " JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u)",
! InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 1745,1759 ****
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
--- 1745,1750 ----
*** a/doc/src/sgml/diskusage.sgml
--- b/doc/src/sgml/diskusage.sgml
***************
*** 20,31 ****
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one index on the
! <acronym>TOAST</> table, if present. There also might be indexes associated
! with the base table. Each table and index is stored in a separate disk
! file — possibly more than one file, if the file would exceed one
! gigabyte. Naming conventions for these files are described in <xref
! linkend="storage-file-layout">.
</para>
<para>
--- 20,31 ----
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one valid index
! on the <acronym>TOAST</> table, if present. There also might be indexes
! associated with the base table. Each table and index is stored in a
! separate disk file — possibly more than one file, if the file would
! exceed one gigabyte. Naming conventions for these files are described
! in <xref linkend="storage-file-layout">.
</para>
<para>
***************
*** 44,50 ****
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
--- 44,50 ----
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
***************
*** 65,76 **** FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT reltoastidxid
! FROM pg_class
! WHERE oid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
--- 65,76 ----
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT indexrelid
! FROM pg_index
! WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
***************
*** 87,93 **** WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
--- 87,93 ----
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
***************
*** 101,107 **** SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
--- 101,107 ----
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***************
*** 1163,1174 **** postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
</row>
</tbody>
</tgroup>
--- 1163,1174 ----
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***************
*** 76,86 **** do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
/* ----------
--- 76,88 ----
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel,
! Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+ static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
***************
*** 1237,1244 **** static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel;
! Relation toastidx;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
--- 1239,1246 ----
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
***************
*** 1257,1271 **** toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
--- 1259,1287 ----
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID. A toast table can have multiple identical
! * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
***************
*** 1330,1336 **** toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
}
else
--- 1346,1352 ----
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
***************
*** 1367,1373 **** toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
--- 1383,1390 ----
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid,
! RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
***************
*** 1384,1390 **** toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
--- 1401,1407 ----
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
***************
*** 1423,1438 **** toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! index_insert(toastidx, t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidx->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
--- 1440,1457 ----
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table for all the
! * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! for (i = 0; i < num_indexes; i++)
! index_insert(toastidxs[i], t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidxs[i]->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
***************
*** 1447,1456 **** toast_save_datum(Relation rel, Datum value,
}
/*
! * Done - close toast relation
*/
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
* Create the TOAST pointer value that we'll return
--- 1466,1477 ----
}
/*
! * Done - close toast relations
*/
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
***************
*** 1474,1484 **** toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
if (!VARATT_IS_EXTERNAL(attr))
return;
--- 1495,1508 ----
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
***************
*** 1487,1496 **** toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1511,1532 ----
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /*
! * We actually use only the first valid index but taking a lock on all is
! * necessary.
! */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1505,1511 **** toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1541,1547 ----
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1519,1526 **** toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
--- 1555,1564 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
***************
*** 1531,1541 **** toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1569,1596 ----
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfValid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1548,1554 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
--- 1603,1610 ----
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel,
! RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
***************
*** 1556,1561 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
--- 1612,1622 ----
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
***************
*** 1573,1579 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid);
heap_close(toastrel, AccessShareLock);
--- 1634,1640 ----
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
***************
*** 1591,1598 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
--- 1652,1659 ----
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
***************
*** 1607,1612 **** toast_fetch_datum(struct varlena * attr)
--- 1668,1676 ----
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
***************
*** 1622,1632 **** toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
--- 1686,1706 ----
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
***************
*** 1645,1651 **** toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1719,1725 ----
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1734,1741 **** toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 1808,1817 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
***************
*** 1750,1757 **** toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
--- 1826,1833 ----
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
***************
*** 1774,1779 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
--- 1850,1858 ----
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
***************
*** 1816,1826 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
--- 1895,1912 ----
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfValid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
***************
*** 1861,1867 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1947,1953 ----
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1958,1965 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 2044,2079 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+ /* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+ static Relation
+ toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+ {
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+ }
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 781,787 **** InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
--- 781,786 ----
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 103,109 **** static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
--- 103,109 ----
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
***************
*** 1072,1078 **** index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
--- 1072,1077 ----
***************
*** 1254,1260 **** index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
--- 1253,1258 ----
***************
*** 1764,1771 **** FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
--- 1762,1767 ----
***************
*** 1781,1788 **** FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
! bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
--- 1777,1785 ----
*/
static void
index_update_stats(Relation rel,
! bool hasindex,
! bool isprimary,
! double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
***************
*** 1876,1890 **** index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
--- 1873,1878 ----
***************
*** 2072,2085 **** index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
--- 2060,2070 ----
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 473,488 **** CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! pg_stat_get_blocks_fetched(X.oid) -
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_class X ON T.reltoastidxid = X.oid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
--- 473,488 ----
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! sum(pg_stat_get_blocks_fetched(X.indexrelid) -
! pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
! sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 1172,1179 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
--- 1172,1177 ----
***************
*** 1392,1410 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
}
/*
! * If we're swapping two toast tables by content, do the same for their
! * indexes.
*/
if (swap_toast_by_content &&
! relform1->reltoastidxid && relform2->reltoastidxid)
! swap_relation_files(relform1->reltoastidxid,
! relform2->reltoastidxid,
! target_is_pg_class,
! swap_toast_by_content,
! is_internal,
! InvalidTransactionId,
! InvalidMultiXactId,
! mapped_tables);
/* Clean up. */
heap_freetuple(reltup1);
--- 1390,1440 ----
}
/*
! * If we're swapping two toast tables by content, do the same for all of
! * their indexes. The swap can actually be safely done only if the
! * relations have indexes.
*/
if (swap_toast_by_content &&
! relform1->reltoastrelid &&
! relform2->reltoastrelid)
! {
! Relation toastRel1, toastRel2;
!
! /* Open relations */
! toastRel1 = heap_open(relform1->reltoastrelid, AccessExclusiveLock);
! toastRel2 = heap_open(relform2->reltoastrelid, AccessExclusiveLock);
!
! /* Obtain index list */
! RelationGetIndexList(toastRel1);
! RelationGetIndexList(toastRel2);
!
! /* Check if the swap is possible for all the toast indexes */
! if (list_length(toastRel1->rd_indexlist) == 1 &&
! list_length(toastRel2->rd_indexlist) == 1)
! {
! swap_relation_files(linitial_oid(toastRel1->rd_indexlist),
! linitial_oid(toastRel2->rd_indexlist),
! target_is_pg_class,
! swap_toast_by_content,
! is_internal,
! InvalidTransactionId,
! InvalidMultiXactId,
! mapped_tables);
! }
! else
! {
! /*
! * As this code path is only taken by shared catalogs, who cannot
! * have multiple indexes on their toast relation, simply return
! * an error.
! */
! elog(ERROR,
! "cannot swap relation files of a shared catalog with multiple indexes on toast relation");
! }
!
! heap_close(toastRel1, AccessExclusiveLock);
! heap_close(toastRel2, AccessExclusiveLock);
! }
/* Clean up. */
heap_freetuple(reltup1);
***************
*** 1529,1540 **** finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
- Oid toastidx;
char NewToastName[NAMEDATALEN];
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
! toastidx = toastrel->rd_rel->reltoastidxid;
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
--- 1559,1571 ----
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Relation toastrel;
char NewToastName[NAMEDATALEN];
+ ListCell *lc;
+ int count = 0;
toastrel = relation_open(newrel->rd_rel->reltoastrelid,
AccessShareLock);
! RelationGetIndexList(toastrel);
relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
***************
*** 1543,1553 **** finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
! /* ... and its index too */
! snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
! OIDOldHeap);
! RenameRelationInternal(toastidx,
! NewToastName, true);
}
relation_close(newrel, NoLock);
}
--- 1574,1596 ----
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
! /* ... and its indexes too */
! foreach(lc, toastrel->rd_indexlist)
! {
! /*
! * The first index keeps the former toast name and the
! * following entries have a suffix appended.
! */
! if (count == 0)
! snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
! OIDOldHeap);
! else
! snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d",
! OIDOldHeap, count);
! RenameRelationInternal(lfirst_oid(lc),
! NewToastName, true);
! count++;
! }
}
relation_close(newrel, NoLock);
}
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 8728,8734 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
--- 8728,8733 ----
***************
*** 8736,8741 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
--- 8735,8742 ----
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
***************
*** 8782,8788 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
! reltoastidxid = rel->rd_rel->reltoastidxid;
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
--- 8783,8795 ----
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
! /* Fetch the list of indexes on toast relation if necessary */
! if (OidIsValid(reltoastrelid))
! {
! Relation toastRel = relation_open(reltoastrelid, lockmode);
! reltoastidxids = RelationGetIndexList(toastRel);
! relation_close(toastRel, lockmode);
! }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
***************
*** 8863,8870 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
! if (OidIsValid(reltoastidxid))
! ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
}
/*
--- 8870,8884 ----
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
! foreach(lc, reltoastidxids)
! {
! Oid toastidxid = lfirst_oid(lc);
! if (OidIsValid(toastidxid))
! ATExecSetTableSpace(toastidxid, newTableSpace, lockmode);
! }
!
! /* Clean up */
! list_free(reltoastidxids);
}
/*
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
***************
*** 575,582 **** DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid/reltoastidxid of
! * the toast table we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
--- 575,582 ----
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid of the toast table
! * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
***************
*** 588,594 **** DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
--- 588,593 ----
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***************
*** 332,338 **** pg_relation_size(PG_FUNCTION_ARGS)
}
/*
! * Calculate total on-disk size of a TOAST relation, including its index.
* Must not be applied to non-TOAST relations.
*/
static int64
--- 332,338 ----
}
/*
! * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
***************
*** 340,347 **** calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
toastRel = relation_open(toastrelid, AccessShareLock);
--- 340,347 ----
{
int64 size = 0;
Relation toastRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
***************
*** 351,362 **** calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
! relation_close(toastIdxRel, AccessShareLock);
relation_close(toastRel, AccessShareLock);
return size;
--- 351,370 ----
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! RelationGetIndexList(toastRel);
! /* Size is calculated using all the indexes available */
! foreach(lc, toastRel->rd_indexlist)
! {
! Relation toastIdxRel;
! toastIdxRel = relation_open(lfirst_oid(lc),
! AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
!
! relation_close(toastIdxRel, AccessShareLock);
! }
relation_close(toastRel, AccessShareLock);
return size;
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 2781,2796 **** binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.reltoastidxid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
! "WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
--- 2781,2796 ----
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
! "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
***************
*** 2816,2822 **** binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
--- 2816,2822 ----
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
*** a/src/include/catalog/pg_class.h
--- b/src/include/catalog/pg_class.h
***************
*** 48,54 **** CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
--- 48,53 ----
***************
*** 94,100 **** typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
! #define Natts_pg_class 29
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
--- 93,99 ----
* ----------------
*/
! #define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
***************
*** 107,129 **** typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
! #define Anum_pg_class_reltoastidxid 13
! #define Anum_pg_class_relhasindex 14
! #define Anum_pg_class_relisshared 15
! #define Anum_pg_class_relpersistence 16
! #define Anum_pg_class_relkind 17
! #define Anum_pg_class_relnatts 18
! #define Anum_pg_class_relchecks 19
! #define Anum_pg_class_relhasoids 20
! #define Anum_pg_class_relhaspkey 21
! #define Anum_pg_class_relhasrules 22
! #define Anum_pg_class_relhastriggers 23
! #define Anum_pg_class_relhassubclass 24
! #define Anum_pg_class_relispopulated 25
! #define Anum_pg_class_relfrozenxid 26
! #define Anum_pg_class_relminmxid 27
! #define Anum_pg_class_relacl 28
! #define Anum_pg_class_reloptions 29
/* ----------------
* initial contents of pg_class
--- 106,127 ----
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
! #define Anum_pg_class_relhasindex 13
! #define Anum_pg_class_relisshared 14
! #define Anum_pg_class_relpersistence 15
! #define Anum_pg_class_relkind 16
! #define Anum_pg_class_relnatts 17
! #define Anum_pg_class_relchecks 18
! #define Anum_pg_class_relhasoids 19
! #define Anum_pg_class_relhaspkey 20
! #define Anum_pg_class_relhasrules 21
! #define Anum_pg_class_relhastriggers 22
! #define Anum_pg_class_relhassubclass 23
! #define Anum_pg_class_relispopulated 24
! #define Anum_pg_class_relfrozenxid 25
! #define Anum_pg_class_relminmxid 26
! #define Anum_pg_class_relacl 27
! #define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
***************
*** 138,150 **** typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
! DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
--- 136,148 ----
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
! DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
*** a/src/include/utils/relcache.h
--- b/src/include/utils/relcache.h
***************
*** 29,34 **** typedef struct RelationData *Relation;
--- 29,44 ----
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfValid
+ * Get index list of relation without recomputing it.
+ */
+ #define RelationGetIndexListIfValid(rel) \
+ do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+ } while(0)
+
+ /*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
*** a/src/test/regress/expected/oidjoins.out
--- b/src/test/regress/expected/oidjoins.out
***************
*** 353,366 **** WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
- ------+---------------
- (0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 353,358 ----
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
***************
*** 1852,1866 **** SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
! | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
--- 1852,1866 ----
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
! | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
***************
*** 2347,2357 **** select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | reltoastidxid | relkind | relfrozenxid
! ---------------+---------------+---------+--------------
! 0 | 0 | v | 0
(1 row)
drop view fooview;
--- 2347,2357 ----
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | relkind | relfrozenxid
! ---------------+---------+--------------
! 0 | v | 0
(1 row)
drop view fooview;
*** a/src/test/regress/sql/oidjoins.sql
--- b/src/test/regress/sql/oidjoins.sql
***************
*** 177,186 **** SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 177,182 ----
*** a/src/test/regress/sql/rules.sql
--- b/src/test/regress/sql/rules.sql
***************
*** 872,878 **** create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
--- 872,878 ----
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
*** a/src/tools/findoidjoins/README
--- b/src/tools/findoidjoins/README
***************
*** 86,92 **** Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
- Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
--- 86,91 ----
On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
Please find an updated patch. The regression test rules has been
updated, and all the comments are addressed.On Tue, Jun 18, 2013 at 6:35 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hi,
On 2013-06-18 10:53:25 +0900, Michael Paquier wrote:
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c index c381f11..3a6342c 100644 --- a/contrib/pg_upgrade/info.c +++ b/contrib/pg_upgrade/info.c @@ -321,12 +321,17 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo) "INSERT INTO info_rels " "SELECT reltoastrelid " "FROM info_rels i JOIN pg_catalog.pg_class c " - " ON i.reloid = c.oid")); + " ON i.reloid = c.oid " + " AND c.reltoastrelid != %u", InvalidOid)); PQclear(executeQueryOrDie(conn, "INSERT INTO info_rels " - "SELECT reltoastidxid " - "FROM info_rels i JOIN pg_catalog.pg_class c " - " ON i.reloid = c.oid")); + "SELECT indexrelid " + "FROM pg_index " + "WHERE indrelid IN (SELECT reltoastrelid " + " FROM pg_class " + " WHERE oid >= %u " + " AND reltoastrelid != %u)", + FirstNormalObjectId, InvalidOid));What's the idea behind the >= here?
It is here to avoid fetching the toast relations of system tables. But
I see your point, the inner query fetching the toast OIDs should do a
join on the exising info_rels and not try to do a join on a plain
pg_index, so changed this way.
I'd also rather not introduce knowledge about FirstNormalObjectId into
client applications... But you fixed it already.
/* Clean up. */ heap_freetuple(reltup1); @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap, if (OidIsValid(newrel->rd_rel->reltoastrelid)) { Relation toastrel; - Oid toastidx; char NewToastName[NAMEDATALEN]; + ListCell *lc; + int count = 0;toastrel = relation_open(newrel->rd_rel->reltoastrelid, AccessShareLock); - toastidx = toastrel->rd_rel->reltoastidxid; + RelationGetIndexList(toastrel); relation_close(toastrel, AccessShareLock);/* rename the toast table ... */
@@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);- /* ... and its index too */ - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", - OIDOldHeap); - RenameRelationInternal(toastidx, - NewToastName, true); + /* ... and its indexes too */ + foreach(lc, toastrel->rd_indexlist) + { + /* + * The first index keeps the former toast name and the + * following entries have a suffix appended. + */ + if (count == 0) + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", + OIDOldHeap); + else + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d", + OIDOldHeap, count); + RenameRelationInternal(lfirst_oid(lc), + NewToastName, true); + count++; + } } relation_close(newrel, NoLock); }Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.
But why would we copy invalid toast indexes over to the new relation?
Shouldn't the new relation have been freshly built in the previous
steps?
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 8ac2549..31309ed 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -29,6 +29,16 @@ typedef struct RelationData *Relation; typedef Relation *RelationPtr;/* + * RelationGetIndexListIfValid + * Get index list of relation without recomputing it. + */ +#define RelationGetIndexListIfValid(rel) \ +do { \ + if (rel->rd_indexvalid == 0) \ + RelationGetIndexList(rel); \ +} while(0)Isn't this function misnamed and should be
RelationGetIndexListIfInValid?When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.
I can't follow. RelationGetIndexListIfValid() doesn't return
anything. And it doesn't do anything if the list is already valid. It
only does something iff the list currently is invalid.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
/* Clean up. */ heap_freetuple(reltup1); @@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap, if (OidIsValid(newrel->rd_rel->reltoastrelid)) { Relation toastrel; - Oid toastidx; char NewToastName[NAMEDATALEN]; + ListCell *lc; + int count = 0;toastrel = relation_open(newrel->rd_rel->reltoastrelid, AccessShareLock); - toastidx = toastrel->rd_rel->reltoastidxid; + RelationGetIndexList(toastrel); relation_close(toastrel, AccessShareLock);/* rename the toast table ... */
@@ -1543,11 +1585,23 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);- /* ... and its index too */ - snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", - OIDOldHeap); - RenameRelationInternal(toastidx, - NewToastName, true); + /* ... and its indexes too */ + foreach(lc, toastrel->rd_indexlist) + { + /* + * The first index keeps the former toast name and the + * following entries have a suffix appended. + */ + if (count == 0) + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index", + OIDOldHeap); + else + snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index_%d", + OIDOldHeap, count); + RenameRelationInternal(lfirst_oid(lc), + NewToastName, true); + count++; + } } relation_close(newrel, NoLock); }Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.But why would we copy invalid toast indexes over to the new relation?
Shouldn't the new relation have been freshly built in the previous
steps?
What do you think about that? Using only the first valid index would be enough?
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 8ac2549..31309ed 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -29,6 +29,16 @@ typedef struct RelationData *Relation; typedef Relation *RelationPtr;/* + * RelationGetIndexListIfValid + * Get index list of relation without recomputing it. + */ +#define RelationGetIndexListIfValid(rel) \ +do { \ + if (rel->rd_indexvalid == 0) \ + RelationGetIndexList(rel); \ +} while(0)Isn't this function misnamed and should be
RelationGetIndexListIfInValid?When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.I can't follow. RelationGetIndexListIfValid() doesn't return
anything. And it doesn't do anything if the list is already valid. It
only does something iff the list currently is invalid.
In this case RelationGetIndexListIfInvalid?
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
@@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.But why would we copy invalid toast indexes over to the new relation?
Shouldn't the new relation have been freshly built in the previous
steps?What do you think about that? Using only the first valid index would be enough?
What I am thinking about is the following: When we rewrite a relation,
we build a completely new toast relation. Which will only have one
index, right? So I don't see how this could could be correct if we deal
with multiple indexes. In fact, the current patch's swap_relation_files
throws an error if there are multiple ones around.
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 8ac2549..31309ed 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -29,6 +29,16 @@ typedef struct RelationData *Relation; typedef Relation *RelationPtr;/* + * RelationGetIndexListIfValid + * Get index list of relation without recomputing it. + */ +#define RelationGetIndexListIfValid(rel) \ +do { \ + if (rel->rd_indexvalid == 0) \ + RelationGetIndexList(rel); \ +} while(0)Isn't this function misnamed and should be
RelationGetIndexListIfInValid?When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.I can't follow. RelationGetIndexListIfValid() doesn't return
anything. And it doesn't do anything if the list is already valid. It
only does something iff the list currently is invalid.In this case RelationGetIndexListIfInvalid?
Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?
Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
OK let's finalize this patch first. I'll try to send an updated patch
within today.
On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
@@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.But why would we copy invalid toast indexes over to the new relation?
Shouldn't the new relation have been freshly built in the previous
steps?What do you think about that? Using only the first valid index would be enough?
What I am thinking about is the following: When we rewrite a relation,
we build a completely new toast relation. Which will only have one
index, right? So I don't see how this could could be correct if we deal
with multiple indexes. In fact, the current patch's swap_relation_files
throws an error if there are multiple ones around.
Yes, OK. Let me have a look at the code of CLUSTER more in details
before giving a precise answer, but I'll try to remove that renaming
part. Btw, I'd like to add an assertion in the code at least to
prevent wrong use of this code path.
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 8ac2549..31309ed 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -29,6 +29,16 @@ typedef struct RelationData *Relation; typedef Relation *RelationPtr;/* + * RelationGetIndexListIfValid + * Get index list of relation without recomputing it. + */ +#define RelationGetIndexListIfValid(rel) \ +do { \ + if (rel->rd_indexvalid == 0) \ + RelationGetIndexList(rel); \ +} while(0)Isn't this function misnamed and should be
RelationGetIndexListIfInValid?When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.I can't follow. RelationGetIndexListIfValid() doesn't return
anything. And it doesn't do anything if the list is already valid. It
only does something iff the list currently is invalid.In this case RelationGetIndexListIfInvalid?
Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?
Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=
Yes, I was wondering about potential memory leak that list_copy could
introduce in tuptoaster.c when doing a bulk insert, that's the only
reason why I added this macro.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=
By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
OK, please find attached a new patch for the toast part. IMHO, the
patch is now in a pretty good shape... But I cannot judge for others.
On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-21 20:54:34 +0900, Michael Paquier wrote:
On Fri, Jun 21, 2013 at 6:19 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-19 09:55:24 +0900, Michael Paquier wrote:
@@ -1529,12 +1570,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
Is it actually possible to get here with multiple toast indexes?
Actually it is possible. finish_heap_swap is also called for example
in ALTER TABLE where rewriting the table (phase 3), so I think it is
better to protect this code path this way.But why would we copy invalid toast indexes over to the new relation?
Shouldn't the new relation have been freshly built in the previous
steps?What do you think about that? Using only the first valid index would be enough?
What I am thinking about is the following: When we rewrite a relation,
we build a completely new toast relation. Which will only have one
index, right? So I don't see how this could could be correct if we deal
with multiple indexes. In fact, the current patch's swap_relation_files
throws an error if there are multiple ones around.
I have reworked the code in cluster.c and made the changes more
consistent knowing that a given toast relation should only have one
valid index. This minimizes modifications where relfilenode is swapped
for toast indexes as now the swap is done only on the unique valid
indexes that a toast relation has. Also, I removed the error that was
in previouss versions triggered when a toast relation had more than
one index.
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h index 8ac2549..31309ed 100644 --- a/src/include/utils/relcache.h +++ b/src/include/utils/relcache.h @@ -29,6 +29,16 @@ typedef struct RelationData *Relation; typedef Relation *RelationPtr;/* + * RelationGetIndexListIfValid + * Get index list of relation without recomputing it. + */ +#define RelationGetIndexListIfValid(rel) \ +do { \ + if (rel->rd_indexvalid == 0) \ + RelationGetIndexList(rel); \ +} while(0)Isn't this function misnamed and should be
RelationGetIndexListIfInValid?When naming that; I had more in mind: "get the list of indexes if it
is already there". It looks more intuitive to my mind.I can't follow. RelationGetIndexListIfValid() doesn't return
anything. And it doesn't do anything if the list is already valid. It
only does something iff the list currently is invalid.In this case RelationGetIndexListIfInvalid?
Yep. Suggested that above ;). Maybe RelationFetchIndexListIfInvalid()?
Changed the function name this way.
Also, I ran quickly the performance test that Andres sent previously
on my MBA and I couldn't notice any difference in performance.
master branch + patch:
tps = 2034.339242 (including connections establishing)
tps = 2034.406515 (excluding connections establishing)
master branch:
tps = 2083.172009 (including connections establishing)
tps = 2083.237669 (excluding connections establishing)
Thanks,
--
Michael
Attachments:
20130622_1_remove_reltoastidxid_v11.patchapplication/octet-stream; name=20130622_1_remove_reltoastidxid_v11.patchDownload
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
***************
*** 321,332 **** get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT reltoastidxid "
! "FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
--- 321,339 ----
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT indexrelid "
! "FROM pg_index "
! "WHERE indisvalid "
! " AND indrelid IN (SELECT reltoastrelid "
! " FROM info_rels i "
! " JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u)",
! InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 1745,1759 ****
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
--- 1745,1750 ----
*** a/doc/src/sgml/diskusage.sgml
--- b/doc/src/sgml/diskusage.sgml
***************
*** 20,31 ****
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one index on the
! <acronym>TOAST</> table, if present. There also might be indexes associated
! with the base table. Each table and index is stored in a separate disk
! file — possibly more than one file, if the file would exceed one
! gigabyte. Naming conventions for these files are described in <xref
! linkend="storage-file-layout">.
</para>
<para>
--- 20,31 ----
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one valid index
! on the <acronym>TOAST</> table, if present. There also might be indexes
! associated with the base table. Each table and index is stored in a
! separate disk file — possibly more than one file, if the file would
! exceed one gigabyte. Naming conventions for these files are described
! in <xref linkend="storage-file-layout">.
</para>
<para>
***************
*** 44,50 ****
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
--- 44,50 ----
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
***************
*** 65,76 **** FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT reltoastidxid
! FROM pg_class
! WHERE oid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
--- 65,76 ----
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT indexrelid
! FROM pg_index
! WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
***************
*** 87,93 **** WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
--- 87,93 ----
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
***************
*** 101,107 **** SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
--- 101,107 ----
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***************
*** 1163,1174 **** postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
</row>
</tbody>
</tgroup>
--- 1163,1174 ----
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***************
*** 76,86 **** do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
/* ----------
--- 76,88 ----
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel,
! Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+ static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
***************
*** 1222,1227 **** toast_compress_datum(Datum value)
--- 1224,1270 ----
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+ Oid
+ toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+ {
+ ListCell *lc;
+ int num_indexes, i = 0;
+ Oid validIndexOid;
+ Relation validIndexRel;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+ RelationGetIndexListIfInvalid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open all the index relations */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch valid toast index */
+ validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
+ validIndexOid = RelationGetRelid(validIndexRel);
+
+ /* Close all the index relations */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+
+ heap_close(toastrel, lock);
+ return validIndexOid;
+ }
+
+
+ /* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
***************
*** 1237,1244 **** static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel;
! Relation toastidx;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
--- 1280,1287 ----
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
***************
*** 1257,1271 **** toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
--- 1300,1328 ----
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID. A toast table can have multiple identical
! * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfInvalid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
***************
*** 1330,1336 **** toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
}
else
--- 1387,1393 ----
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
***************
*** 1367,1373 **** toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
--- 1424,1431 ----
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid,
! RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
***************
*** 1384,1390 **** toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
--- 1442,1448 ----
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
***************
*** 1423,1438 **** toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! index_insert(toastidx, t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidx->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
--- 1481,1498 ----
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table for all the
! * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! for (i = 0; i < num_indexes; i++)
! index_insert(toastidxs[i], t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidxs[i]->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
***************
*** 1447,1456 **** toast_save_datum(Relation rel, Datum value,
}
/*
! * Done - close toast relation
*/
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
* Create the TOAST pointer value that we'll return
--- 1507,1518 ----
}
/*
! * Done - close toast relations
*/
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
***************
*** 1474,1484 **** toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
if (!VARATT_IS_EXTERNAL(attr))
return;
--- 1536,1549 ----
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
***************
*** 1487,1496 **** toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1552,1573 ----
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! RelationGetIndexListIfInvalid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /*
! * We actually use only the first valid index but taking a lock on all is
! * necessary.
! */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1505,1511 **** toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1582,1588 ----
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1519,1526 **** toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
--- 1596,1605 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
***************
*** 1531,1541 **** toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1610,1637 ----
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ RelationGetIndexListIfInvalid(toastrel);
+ num_indexes = list_length(toastrel->rd_indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, toastrel->rd_indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1548,1554 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
--- 1644,1651 ----
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel,
! RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
***************
*** 1556,1561 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
--- 1653,1663 ----
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ pfree(toastidxs);
+
return result;
}
***************
*** 1573,1579 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid);
heap_close(toastrel, AccessShareLock);
--- 1675,1681 ----
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
***************
*** 1591,1598 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
--- 1693,1700 ----
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
***************
*** 1607,1612 **** toast_fetch_datum(struct varlena * attr)
--- 1709,1717 ----
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
***************
*** 1622,1632 **** toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
--- 1727,1747 ----
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfInvalid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
***************
*** 1645,1651 **** toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1760,1766 ----
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1734,1741 **** toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 1849,1858 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
***************
*** 1750,1757 **** toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
--- 1867,1874 ----
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
***************
*** 1774,1779 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
--- 1891,1899 ----
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
Assert(VARATT_IS_EXTERNAL(attr));
***************
*** 1816,1826 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
--- 1936,1953 ----
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! RelationGetIndexListIfInvalid(toastrel);
! num_indexes = list_length(toastrel->rd_indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! foreach(lc, toastrel->rd_indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
***************
*** 1861,1867 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1988,1994 ----
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1958,1965 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 2085,2120 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+ /* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+ static Relation
+ toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+ {
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+ }
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 781,787 **** InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
--- 781,786 ----
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 103,109 **** static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
--- 103,109 ----
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
***************
*** 1072,1078 **** index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
--- 1072,1077 ----
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 473,488 **** CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! pg_stat_get_blocks_fetched(X.oid) -
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_class X ON T.reltoastidxid = X.oid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
--- 473,488 ----
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! sum(pg_stat_get_blocks_fetched(X.indexrelid) -
! pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
! sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 21,26 ****
--- 21,27 ----
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+ #include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
***************
*** 1172,1179 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
--- 1173,1178 ----
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 8728,8734 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
--- 8728,8733 ----
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
***************
*** 575,582 **** DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid/reltoastidxid of
! * the toast table we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
--- 575,582 ----
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid of the toast table
! * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
***************
*** 588,594 **** DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
--- 588,593 ----
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***************
*** 332,338 **** pg_relation_size(PG_FUNCTION_ARGS)
}
/*
! * Calculate total on-disk size of a TOAST relation, including its index.
* Must not be applied to non-TOAST relations.
*/
static int64
--- 332,338 ----
}
/*
! * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
***************
*** 340,347 **** calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
toastRel = relation_open(toastrelid, AccessShareLock);
--- 340,347 ----
{
int64 size = 0;
Relation toastRel;
ForkNumber forkNum;
+ ListCell *lc;
toastRel = relation_open(toastrelid, AccessShareLock);
***************
*** 351,362 **** calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
! relation_close(toastIdxRel, AccessShareLock);
relation_close(toastRel, AccessShareLock);
return size;
--- 351,370 ----
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! RelationGetIndexList(toastRel);
! /* Size is calculated using all the indexes available */
! foreach(lc, toastRel->rd_indexlist)
! {
! Relation toastIdxRel;
! toastIdxRel = relation_open(lfirst_oid(lc),
! AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
!
! relation_close(toastIdxRel, AccessShareLock);
! }
relation_close(toastRel, AccessShareLock);
return size;
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 2781,2796 **** binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.reltoastidxid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
! "WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
--- 2781,2796 ----
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
! "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
***************
*** 2816,2822 **** binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
--- 2816,2822 ----
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has at least one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
*** a/src/include/access/tuptoaster.h
--- b/src/include/access/tuptoaster.h
***************
*** 15,20 ****
--- 15,21 ----
#include "access/htup_details.h"
#include "utils/relcache.h"
+ #include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
***************
*** 188,191 **** extern Size toast_raw_datum_size(Datum value);
--- 189,200 ----
*/
extern Size toast_datum_size(Datum value);
+ /* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+ extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
*** a/src/include/catalog/pg_class.h
--- b/src/include/catalog/pg_class.h
***************
*** 48,54 **** CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
--- 48,53 ----
*** a/src/include/utils/relcache.h
--- b/src/include/utils/relcache.h
***************
*** 29,34 **** typedef struct RelationData *Relation;
--- 29,44 ----
typedef Relation *RelationPtr;
/*
+ * RelationGetIndexListIfInvalid
+ * Get index list of relation without recomputing it if is already exists.
+ */
+ #define RelationGetIndexListIfInvalid(rel) \
+ do { \
+ if (rel->rd_indexvalid == 0) \
+ RelationGetIndexList(rel); \
+ } while(0)
+
+ /*
* Routines to open (lookup) and close a relcache entry
*/
extern Relation RelationIdGetRelation(Oid relationId);
*** a/src/test/regress/expected/oidjoins.out
--- b/src/test/regress/expected/oidjoins.out
***************
*** 353,366 **** WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
- ------+---------------
- (0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 353,358 ----
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
***************
*** 1852,1866 **** SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
! | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
--- 1852,1866 ----
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
! | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
***************
*** 2347,2357 **** select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | reltoastidxid | relkind | relfrozenxid
! ---------------+---------------+---------+--------------
! 0 | 0 | v | 0
(1 row)
drop view fooview;
--- 2347,2357 ----
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | relkind | relfrozenxid
! ---------------+---------+--------------
! 0 | v | 0
(1 row)
drop view fooview;
*** a/src/test/regress/sql/oidjoins.sql
--- b/src/test/regress/sql/oidjoins.sql
***************
*** 177,186 **** SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 177,182 ----
*** a/src/test/regress/sql/rules.sql
--- b/src/test/regress/sql/rules.sql
***************
*** 872,878 **** create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
--- 872,878 ----
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
*** a/src/tools/findoidjoins/README
--- b/src/tools/findoidjoins/README
***************
*** 86,92 **** Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
- Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
--- 86,91 ----
On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.
The problem I see is something else. Consider code like the following:
RelationFetchIndexListIfInvalid(toastrel);
foreach(lc, toastrel->rd_indexlist)
toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
index_open calls relation_open calls LockRelationOid which does:
if (res != LOCKACQUIRE_ALREADY_HELD)
AcceptInvalidationMessages();
So, what might happen is that you open the first index, which accepts an
invalidation message which in turn might delete the indexlist. Which
means we would likely read invalid memory if there are two indexes.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
On Fri, Jun 21, 2013 at 10:47 PM, Andres Freund <andres@2ndquadrant.com> wrote:
Hm. Looking at how this is currently used - I am afraid it's not
correct... the reason RelationGetIndexList() returns a copy is that
cache invalidations will throw away that list. And you do index_open()
while iterating over it which will accept invalidation messages.
Mybe it's better to try using RelationGetIndexList directly and measure
whether that has a measurable impact=By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.The problem I see is something else. Consider code like the following:
RelationFetchIndexListIfInvalid(toastrel);
foreach(lc, toastrel->rd_indexlist)
toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);index_open calls relation_open calls LockRelationOid which does:
if (res != LOCKACQUIRE_ALREADY_HELD)
AcceptInvalidationMessages();So, what might happen is that you open the first index, which accepts an
invalidation message which in turn might delete the indexlist. Which
means we would likely read invalid memory if there are two indexes.
And I imagine that you have the same problem even with
RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
this would appear as long as you try to open more than 1 index with an
index list.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.The problem I see is something else. Consider code like the following:
RelationFetchIndexListIfInvalid(toastrel);
foreach(lc, toastrel->rd_indexlist)
toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);index_open calls relation_open calls LockRelationOid which does:
if (res != LOCKACQUIRE_ALREADY_HELD)
AcceptInvalidationMessages();So, what might happen is that you open the first index, which accepts an
invalidation message which in turn might delete the indexlist. Which
means we would likely read invalid memory if there are two indexes.And I imagine that you have the same problem even with
RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
this would appear as long as you try to open more than 1 index with an
index list.
No. RelationGetIndexList() returns a copy of the list for exactly that
reason. The danger is not to see an outdated list - we should be
protected by locks against that - but looking at uninitialized or reused
memory.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund escribi�:
On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
And I imagine that you have the same problem even with
RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
this would appear as long as you try to open more than 1 index with an
index list.No. RelationGetIndexList() returns a copy of the list for exactly that
reason. The danger is not to see an outdated list - we should be
protected by locks against that - but looking at uninitialized or reused
memory.
Are we doing this only to save some palloc traffic? Could we do this
by, say, teaching list_copy() to have a special case for lists of ints
and oids that allocates all the cells in a single palloc chunk?
(This has the obvious problem that list_free no longer works, of
course. But I think that specific problem can be easily fixed. Not
sure if it causes more breakage elsewhere.)
Alternatively, I guess we could grab an uncopied list, then copy the
items individually into a locally allocated array, avoiding list_copy.
We'd need to iterate differently than with foreach().
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
OK. Please find an updated patch for the toast part.
On Sat, Jun 22, 2013 at 10:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.The problem I see is something else. Consider code like the following:
RelationFetchIndexListIfInvalid(toastrel);
foreach(lc, toastrel->rd_indexlist)
toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);index_open calls relation_open calls LockRelationOid which does:
if (res != LOCKACQUIRE_ALREADY_HELD)
AcceptInvalidationMessages();So, what might happen is that you open the first index, which accepts an
invalidation message which in turn might delete the indexlist. Which
means we would likely read invalid memory if there are two indexes.And I imagine that you have the same problem even with
RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
this would appear as long as you try to open more than 1 index with an
index list.No. RelationGetIndexList() returns a copy of the list for exactly that
reason. The danger is not to see an outdated list - we should be
protected by locks against that - but looking at uninitialized or reused
memory.
OK, so I removed RelationGetIndexListIfInvalid (such things could be
an optimization for another patch) and replaced it by calls to
RelationGetIndexList to get a copy of rd_indexlist in a local list
variable, list free'd when it is not necessary anymore.
It looks that there is nothing left for this patch, no?
--
Michael
Attachments:
20130623_1_remove_reltoastidxid_v12.patchapplication/octet-stream; name=20130623_1_remove_reltoastidxid_v12.patchDownload
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
***************
*** 321,332 **** get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT reltoastidxid "
! "FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
--- 321,339 ----
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT indexrelid "
! "FROM pg_index "
! "WHERE indisvalid "
! " AND indrelid IN (SELECT reltoastrelid "
! " FROM info_rels i "
! " JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u)",
! InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 1745,1759 ****
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
--- 1745,1750 ----
*** a/doc/src/sgml/diskusage.sgml
--- b/doc/src/sgml/diskusage.sgml
***************
*** 20,31 ****
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one index on the
! <acronym>TOAST</> table, if present. There also might be indexes associated
! with the base table. Each table and index is stored in a separate disk
! file — possibly more than one file, if the file would exceed one
! gigabyte. Naming conventions for these files are described in <xref
! linkend="storage-file-layout">.
</para>
<para>
--- 20,31 ----
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one valid index
! on the <acronym>TOAST</> table, if present. There also might be indexes
! associated with the base table. Each table and index is stored in a
! separate disk file — possibly more than one file, if the file would
! exceed one gigabyte. Naming conventions for these files are described
! in <xref linkend="storage-file-layout">.
</para>
<para>
***************
*** 44,50 ****
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
--- 44,50 ----
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
***************
*** 65,76 **** FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT reltoastidxid
! FROM pg_class
! WHERE oid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
--- 65,76 ----
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT indexrelid
! FROM pg_index
! WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
***************
*** 87,93 **** WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
--- 87,93 ----
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
***************
*** 101,107 **** SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
--- 101,107 ----
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***************
*** 1163,1174 **** postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
</row>
</tbody>
</tgroup>
--- 1163,1174 ----
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***************
*** 76,86 **** do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
/* ----------
--- 76,88 ----
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel,
! Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+ static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
***************
*** 1222,1227 **** toast_compress_datum(Datum value)
--- 1224,1272 ----
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+ Oid
+ toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+ {
+ ListCell *lc;
+ List *indexlist;
+ int num_indexes, i = 0;
+ Oid validIndexOid;
+ Relation validIndexRel;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch valid toast index */
+ validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
+ validIndexOid = RelationGetRelid(validIndexRel);
+
+ /* Close all the index relations */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+ list_free(indexlist);
+
+ heap_close(toastrel, lock);
+ return validIndexOid;
+ }
+
+
+ /* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
***************
*** 1237,1244 **** static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel;
! Relation toastidx;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
--- 1282,1289 ----
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
***************
*** 1257,1271 **** toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
--- 1302,1331 ----
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ List *indexlist;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID. A toast table can have multiple identical
! * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! indexlist = RelationGetIndexList(toastrel);
! num_indexes = list_length(indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
***************
*** 1330,1336 **** toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
}
else
--- 1390,1396 ----
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
***************
*** 1367,1373 **** toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
--- 1427,1434 ----
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid,
! RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
***************
*** 1384,1390 **** toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
--- 1445,1451 ----
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
***************
*** 1423,1438 **** toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! index_insert(toastidx, t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidx->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
--- 1484,1501 ----
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table for all the
! * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! for (i = 0; i < num_indexes; i++)
! index_insert(toastidxs[i], t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidxs[i]->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
***************
*** 1447,1456 **** toast_save_datum(Relation rel, Datum value,
}
/*
! * Done - close toast relation
*/
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
* Create the TOAST pointer value that we'll return
--- 1510,1522 ----
}
/*
! * Done - close toast relations
*/
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
! list_free(indexlist);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
***************
*** 1474,1484 **** toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
if (!VARATT_IS_EXTERNAL(attr))
return;
--- 1540,1554 ----
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ List *indexlist;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
***************
*** 1487,1496 **** toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1557,1578 ----
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! indexlist = RelationGetIndexList(toastrel);
! num_indexes = list_length(indexlist);
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /*
! * We actually use only the first valid index but taking a lock on all is
! * necessary.
! */
! foreach(lc, indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1505,1511 **** toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1587,1593 ----
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1519,1526 **** toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
--- 1601,1611 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], RowExclusiveLock);
! list_free(indexlist);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
***************
*** 1531,1541 **** toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1616,1644 ----
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+ List *indexlist;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1548,1554 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
--- 1651,1658 ----
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel,
! RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
***************
*** 1556,1561 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
--- 1660,1671 ----
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ list_free(indexlist);
+ pfree(toastidxs);
+
return result;
}
***************
*** 1573,1579 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid);
heap_close(toastrel, AccessShareLock);
--- 1683,1689 ----
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
***************
*** 1591,1598 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
--- 1701,1708 ----
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
***************
*** 1607,1612 **** toast_fetch_datum(struct varlena * attr)
--- 1717,1726 ----
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ List *indexlist;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
***************
*** 1622,1632 **** toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
--- 1736,1756 ----
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! indexlist = RelationGetIndexList(toastrel);
! num_indexes = list_length(indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! /* Open all the indexes of toast relation with similar lock */
! foreach(lc, indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
!
! /* Fetch relation used for process */
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
***************
*** 1645,1651 **** toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1769,1775 ----
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1734,1741 **** toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 1858,1868 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
! list_free(indexlist);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
***************
*** 1750,1757 **** toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
--- 1877,1884 ----
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
! Relation toastrel, validtoastidx;
! Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
***************
*** 1774,1779 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
--- 1901,1910 ----
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
+ List *indexlist;
Assert(VARATT_IS_EXTERNAL(attr));
***************
*** 1816,1826 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
--- 1947,1964 ----
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! indexlist = RelationGetIndexList(toastrel);
! num_indexes = list_length(indexlist);
!
! toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
!
! foreach(lc, indexlist)
! toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
! validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
***************
*** 1861,1867 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1999,2005 ----
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1958,1965 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 2096,2132 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! for (i = 0; i < num_indexes; i++)
! index_close(toastidxs[i], AccessShareLock);
! list_free(indexlist);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+ /* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+ static Relation
+ toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+ {
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+ }
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 781,787 **** InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
--- 781,786 ----
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 103,109 **** static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
--- 103,109 ----
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
***************
*** 1072,1078 **** index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
--- 1072,1077 ----
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 473,488 **** CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! pg_stat_get_blocks_fetched(X.oid) -
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_class X ON T.reltoastidxid = X.oid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
--- 473,488 ----
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! sum(pg_stat_get_blocks_fetched(X.indexrelid) -
! pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
! sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 21,26 ****
--- 21,27 ----
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+ #include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
***************
*** 1172,1179 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
--- 1173,1178 ----
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 8728,8734 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
--- 8728,8733 ----
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
***************
*** 575,582 **** DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid/reltoastidxid of
! * the toast table we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
--- 575,582 ----
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid of the toast table
! * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
***************
*** 588,594 **** DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
--- 588,593 ----
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***************
*** 332,338 **** pg_relation_size(PG_FUNCTION_ARGS)
}
/*
! * Calculate total on-disk size of a TOAST relation, including its index.
* Must not be applied to non-TOAST relations.
*/
static int64
--- 332,338 ----
}
/*
! * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
***************
*** 340,347 **** calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
toastRel = relation_open(toastrelid, AccessShareLock);
--- 340,348 ----
{
int64 size = 0;
Relation toastRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
***************
*** 351,362 **** calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
! relation_close(toastIdxRel, AccessShareLock);
relation_close(toastRel, AccessShareLock);
return size;
--- 352,372 ----
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! indexlist = RelationGetIndexList(toastRel);
! /* Size is calculated using all the indexes available */
! foreach(lc, indexlist)
! {
! Relation toastIdxRel;
! toastIdxRel = relation_open(lfirst_oid(lc),
! AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
!
! relation_close(toastIdxRel, AccessShareLock);
! }
! list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 2781,2796 **** binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.reltoastidxid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
! "WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
--- 2781,2796 ----
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
! "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
! "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
! pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
***************
*** 2816,2822 **** binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
--- 2816,2822 ----
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
*** a/src/include/access/tuptoaster.h
--- b/src/include/access/tuptoaster.h
***************
*** 15,20 ****
--- 15,21 ----
#include "access/htup_details.h"
#include "utils/relcache.h"
+ #include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
***************
*** 188,191 **** extern Size toast_raw_datum_size(Datum value);
--- 189,200 ----
*/
extern Size toast_datum_size(Datum value);
+ /* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+ extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
*** a/src/include/catalog/pg_class.h
--- b/src/include/catalog/pg_class.h
***************
*** 48,54 **** CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
--- 48,53 ----
*** a/src/test/regress/expected/oidjoins.out
--- b/src/test/regress/expected/oidjoins.out
***************
*** 353,366 **** WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
- ------+---------------
- (0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 353,358 ----
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
***************
*** 1852,1866 **** SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
! | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
--- 1852,1866 ----
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
! | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
***************
*** 2347,2357 **** select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | reltoastidxid | relkind | relfrozenxid
! ---------------+---------------+---------+--------------
! 0 | 0 | v | 0
(1 row)
drop view fooview;
--- 2347,2357 ----
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | relkind | relfrozenxid
! ---------------+---------+--------------
! 0 | v | 0
(1 row)
drop view fooview;
*** a/src/test/regress/sql/oidjoins.sql
--- b/src/test/regress/sql/oidjoins.sql
***************
*** 177,186 **** SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 177,182 ----
*** a/src/test/regress/sql/rules.sql
--- b/src/test/regress/sql/rules.sql
***************
*** 872,878 **** create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
--- 872,878 ----
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
*** a/src/tools/findoidjoins/README
--- b/src/tools/findoidjoins/README
***************
*** 86,92 **** Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
- Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
--- 86,91 ----
On Wed, Jun 19, 2013 at 9:50 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Jun 19, 2013 at 12:36 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Tue, Jun 18, 2013 at 10:53 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:An updated patch for the toast part is attached.
On Tue, Jun 18, 2013 at 3:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Here are the review comments of the removal_of_reltoastidxid patch.
I've not completed the review yet, but I'd like to post the current comments
before going to bed ;)*** a/src/backend/catalog/system_views.sql - pg_stat_get_blocks_fetched(X.oid) - - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read, - pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit + pg_stat_get_blocks_fetched(X.indrelid) - + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_read, + pg_stat_get_blocks_hit(X.indrelid) AS tidx_blks_hitISTM that X.indrelid indicates the TOAST table not the TOAST index.
Shouldn't we use X.indexrelid instead of X.indrelid?Indeed good catch! We need in this case the statistics on the index
and here I used the table OID. Btw, I also noticed that as multiple
indexes may be involved for a given toast relation, it makes sense to
actually calculate tidx_blks_read and tidx_blks_hit as the sum of all
stats of the indexes.Yep. You seem to need to change X.indexrelid to X.indrelid in GROUP clause.
Otherwise, you may get two rows of the same table from pg_statio_all_tables.I changed it a little bit in a different way in my latest patch by
adding a sum on all the indexes when getting tidx_blks stats.doc/src/sgml/diskusage.sgml
There will be one index on the
<acronym>TOAST</> table, if present.+ table (see <xref linkend="storage-toast">). There will be one valid index + on the <acronym>TOAST</> table, if present. There also might be indexesWhen I used gdb and tracked the code path of concurrent reindex patch,
I found it's possible that more than one *valid* toast indexes appear. Those
multiple valid toast indexes are viewable, for example, from pg_indexes.
I'm not sure whether this is the bug of concurrent reindex patch. But
if it's not,
you seem to need to change the above description again.Not sure about that. The latest code is made such as only one valid
index is present on the toast relation at the same time.*** a/src/bin/pg_dump/pg_dump.c + "SELECT c.reltoastrelid, t.indexrelid " "FROM pg_catalog.pg_class c LEFT JOIN " - "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " - "WHERE c.oid = '%u'::pg_catalog.oid;", + "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) " + "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid " + "LIMIT 1",Is there the case where TOAST table has more than one *valid* indexes?
I just rechecked the patch and is answer is no. The concurrent index
is set as valid inside the same transaction as swap. So only the
backend performing the swap will be able to see two valid toast
indexes at the same time.According to my quick gdb testing, this seems not to be true....
Well, I have to disagree. I am not able to reproduce it. Which version
did you use? Here is what I get with the latest version of REINDEX
CONCURRENTLY patch... I checked with the following process:
Sorry. This is my mistake.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Jun 23, 2013 at 3:34 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
OK. Please find an updated patch for the toast part.
On Sat, Jun 22, 2013 at 10:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-22 22:45:26 +0900, Michael Paquier wrote:
On Sat, Jun 22, 2013 at 10:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-22 12:50:52 +0900, Michael Paquier wrote:
By looking at the comments of RelationGetIndexList:relcache.c,
actually the method of the patch is correct because in the event of a
shared cache invalidation, rd_indexvalid is set to 0 when the index
list is reset, so the index list would get recomputed even in the case
of shared mem invalidation.The problem I see is something else. Consider code like the following:
RelationFetchIndexListIfInvalid(toastrel);
foreach(lc, toastrel->rd_indexlist)
toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);index_open calls relation_open calls LockRelationOid which does:
if (res != LOCKACQUIRE_ALREADY_HELD)
AcceptInvalidationMessages();So, what might happen is that you open the first index, which accepts an
invalidation message which in turn might delete the indexlist. Which
means we would likely read invalid memory if there are two indexes.And I imagine that you have the same problem even with
RelationGetIndexList, not only RelationGetIndexListIfInvalid, because
this would appear as long as you try to open more than 1 index with an
index list.No. RelationGetIndexList() returns a copy of the list for exactly that
reason. The danger is not to see an outdated list - we should be
protected by locks against that - but looking at uninitialized or reused
memory.OK, so I removed RelationGetIndexListIfInvalid (such things could be
an optimization for another patch) and replaced it by calls to
RelationGetIndexList to get a copy of rd_indexlist in a local list
variable, list free'd when it is not necessary anymore.It looks that there is nothing left for this patch, no?
Compile error ;)
gcc -O0 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -g -I../../../src/include -c -o index.o index.c
index.c: In function 'index_constraint_create':
index.c:1257: error: too many arguments to function 'index_update_stats'
index.c: At top level:
index.c:1785: error: conflicting types for 'index_update_stats'
index.c:106: error: previous declaration of 'index_update_stats' was here
index.c: In function 'index_update_stats':
index.c:1881: error: 'FormData_pg_class' has no member named 'reltoastidxid'
index.c:1883: error: 'FormData_pg_class' has no member named 'reltoastidxid'
make[3]: *** [index.o] Error 1
make[2]: *** [catalog-recursive] Error 2
make[1]: *** [install-backend-recurse] Error 2
make: *** [install-src-recurse] Error 2
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Compile error ;)
It looks like filterdiff did not work correctly when generating the
latest patch with context diffs, I cannot apply it cleanly wither.
This is perhaps due to a wrong manipulation from me. Please try the
attached that has been generated as a raw git output. It works
correctly with a git apply. I just checked.
--
Michael
Attachments:
20130624_1_remove_reltoastidxid_v12.patchapplication/octet-stream; name=20130624_1_remove_reltoastidxid_v12.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..18daf1c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,19 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indisvalid "
+ " AND indrelid IN (SELECT reltoastrelid "
+ " FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u)",
+ InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e638a8f..f3d1d9e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..8457777 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,13 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static Relation toast_index_fetch_valid(Relation *toastidxs, int num_indexes);
/* ----------
@@ -1222,6 +1224,49 @@ toast_compress_datum(Datum value)
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+Oid
+toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+{
+ ListCell *lc;
+ List *indexlist;
+ int num_indexes, i = 0;
+ Oid validIndexOid;
+ Relation validIndexRel;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch valid toast index */
+ validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes);
+ validIndexOid = RelationGetRelid(validIndexRel);
+
+ /* Close all the index relations */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+ list_free(indexlist);
+
+ heap_close(toastrel, lock);
+ return validIndexOid;
+}
+
+
+/* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
@@ -1237,8 +1282,8 @@ static Datum
toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1302,30 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ ListCell *lc;
+ List *indexlist;
+ int i = 0;
+ int num_indexes;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1390,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
}
else
@@ -1367,7 +1427,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,7 +1445,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(validtoastidx),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1423,16 +1484,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,10 +1510,13 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relations
*/
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
+ list_free(indexlist);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
/*
* Create the TOAST pointer value that we'll return
@@ -1474,11 +1540,15 @@ toast_delete_datum(Relation rel, Datum value)
{
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ ListCell *lc;
+ List *indexlist;
+ int num_indexes;
+ int i = 0;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1557,22 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /*
+ * We actually use only the first valid index but taking a lock on all is
+ * necessary.
+ */
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), RowExclusiveLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1587,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,8 +1601,11 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], RowExclusiveLock);
+ list_free(indexlist);
heap_close(toastrel, RowExclusiveLock);
+ pfree(toastidxs);
}
@@ -1531,11 +1616,29 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int i = 0;
+ int num_indexes;
+ Relation *toastidxs;
+ Relation validtoastidx;
+ ListCell *lc;
+ List *indexlist;
+
+ /* Ensure that the list of indexes of toast relation is computed */
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+
+ /* Open each index relation necessary */
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), lockmode);
+
+ /* Fetch a valid index relation */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,7 +1651,8 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(validtoastidx),
true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
@@ -1556,6 +1660,12 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
systable_endscan(toastscan);
+ /* Clean up */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lockmode);
+ list_free(indexlist);
+ pfree(toastidxs);
+
return result;
}
@@ -1573,7 +1683,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
@@ -1591,8 +1701,8 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1717,10 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ ListCell *lc;
+ List *indexlist;
+ int num_indexes;
+ int i = 0;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1736,21 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ /* Open all the indexes of toast relation with similar lock */
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+
+ /* Fetch relation used for process */
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1769,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,8 +1858,11 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
+ list_free(indexlist);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
@@ -1750,8 +1877,8 @@ toast_fetch_datum(struct varlena * attr)
static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
- Relation toastrel;
- Relation toastidx;
+ Relation toastrel, validtoastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1901,10 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int i = 0;
+ ListCell *lc;
+ List *indexlist;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1947,18 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+ indexlist = RelationGetIndexList(toastrel);
+ num_indexes = list_length(indexlist);
+
+ toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation));
+
+ foreach(lc, indexlist)
+ toastidxs[i++] = index_open(lfirst_oid(lc), AccessShareLock);
+ validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1999,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, validtoastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2096,37 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], AccessShareLock);
+ list_free(indexlist);
heap_close(toastrel, AccessShareLock);
+ pfree(toastidxs);
return result;
}
+
+/* ----------
+ * toast_index_fetch_valid
+ *
+ * Get a valid index in list of indexes for a toast relation. Those relations
+ * need to be already open prior calling this routine.
+ */
+static Relation
+toast_index_fetch_valid(Relation *toastidxs, int num_indexes)
+{
+ int i;
+ Relation res = NULL;
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < num_indexes; i++)
+ {
+ if (toastidxs[i]->rd_index->indisvalid)
+ {
+ res = toastidxs[i];
+ break;
+ }
+ }
+
+ Assert(res);
+ return res;
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..e08954e 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..e196a0c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..3c2a474 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..e4747a9 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,6 +21,7 @@
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+#include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
@@ -1172,8 +1173,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1393,18 +1392,31 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * valid index. The swap can actually be safely done only if the relations
+ * have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- is_internal,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Oid toastIndex1, toastIndex2;
+
+ /* Get valid index for each relation */
+ toastIndex1 = toast_get_valid_index(relform1->reltoastrelid,
+ AccessExclusiveLock);
+ toastIndex2 = toast_get_valid_index(relform2->reltoastrelid,
+ AccessExclusiveLock);
+
+ if (toastIndex1 && toastIndex2)
+ swap_relation_files(toastIndex1,
+ toastIndex2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ is_internal,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1528,14 +1540,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
- toastrel = relation_open(newrel->rd_rel->reltoastrelid,
- AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
- relation_close(toastrel, AccessShareLock);
+ /* Get the associated valid index to be renamed */
+ toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
+ AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1543,9 +1553,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
+ /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8294b29..2b777da 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8728,7 +8728,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8736,6 +8735,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8863,8 +8870,15 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ {
+ Oid toastidxid = lfirst_oid(lc);
+ if (OidIsValid(toastidxid))
+ ATExecSetTableSpace(toastidxid, newTableSpace, lockmode);
+ }
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..f721dbb 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..bdef79b 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,9 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +352,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ indexlist = RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
+ list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ec956ad..ccd5663 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2816,7 +2816,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* every toast table has one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 6f4fc45..5890290 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -15,6 +15,7 @@
#include "access/htup_details.h"
#include "utils/relcache.h"
+#include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
@@ -188,4 +189,12 @@ extern Size toast_raw_datum_size(Datum value);
*/
extern Size toast_datum_size(Datum value);
+/* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..a6444a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
+ | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
On 2013-06-24 07:46:34 +0900, Michael Paquier wrote:
On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Compile error ;)
It looks like filterdiff did not work correctly when generating the
latest patch with context diffs, I cannot apply it cleanly wither.
This is perhaps due to a wrong manipulation from me. Please try the
attached that has been generated as a raw git output. It works
correctly with a git apply. I just checked.
Did you check whether that introduces a performance regression?
/* ---------- + * toast_get_valid_index + * + * Get the valid index of given toast relation. A toast relation can only + * have one valid index at the same time. The lock taken on the index + * relations is released at the end of this function call. + */ +Oid +toast_get_valid_index(Oid toastoid, LOCKMODE lock) +{ + ListCell *lc; + List *indexlist; + int num_indexes, i = 0; + Oid validIndexOid; + Relation validIndexRel; + Relation *toastidxs; + Relation toastrel; + + /* Get the index list of relation */ + toastrel = heap_open(toastoid, lock); + indexlist = RelationGetIndexList(toastrel); + num_indexes = list_length(indexlist); + + /* Open all the index relations */ + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation)); + foreach(lc, indexlist) + toastidxs[i++] = index_open(lfirst_oid(lc), lock); + + /* Fetch valid toast index */ + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes); + validIndexOid = RelationGetRelid(validIndexRel); + + /* Close all the index relations */ + for (i = 0; i < num_indexes; i++) + index_close(toastidxs[i], lock); + pfree(toastidxs); + list_free(indexlist); + + heap_close(toastrel, lock); + return validIndexOid; +}
Just to make sure, could you check we've found a valid index?
static bool -toastrel_valueid_exists(Relation toastrel, Oid valueid) +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode) { bool result = false; ScanKeyData toastkey; SysScanDesc toastscan; + int i = 0; + int num_indexes; + Relation *toastidxs; + Relation validtoastidx; + ListCell *lc; + List *indexlist; + + /* Ensure that the list of indexes of toast relation is computed */ + indexlist = RelationGetIndexList(toastrel); + num_indexes = list_length(indexlist); + + /* Open each index relation necessary */ + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation)); + foreach(lc, indexlist) + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode); + + /* Fetch a valid index relation */ + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);
Those 10 lines are repeated multiple times, in different
functions. Maybe move them into toast_index_fetch_valid and rename that
to
Relation *
toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, size_t valididx);
That way we also wouldn't fetch/copy the indexlist twice in some
functions.
+ /* Clean up */ + for (i = 0; i < num_indexes; i++) + index_close(toastidxs[i], lockmode); + list_free(indexlist); + pfree(toastidxs);
The indexlist could already be freed inside the function proposed
above...
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8294b29..2b777da 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode) errmsg("cannot move temporary tables of other sessions")));
+ foreach(lc, reltoastidxids) + { + Oid toastidxid = lfirst_oid(lc); + if (OidIsValid(toastidxid)) + ATExecSetTableSpace(toastidxid, newTableSpace, lockmode); + }
Copy & pasted OidIsValid(), shouldn't be necessary anymore.
Otherwise I think there's not really much left to be done. Fujii?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Andres Freund <andres@2ndquadrant.com> writes:
Otherwise I think there's not really much left to be done. Fujii?
Well, other than the fact that we've not got MVCC catalog scans yet.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-06-24 09:57:24 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Otherwise I think there's not really much left to be done. Fujii?
Well, other than the fact that we've not got MVCC catalog scans yet.
That statement was only about about the patch dealing the removal of
reltoastidxid.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 24, 2013 at 7:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-24 07:46:34 +0900, Michael Paquier wrote:
On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Compile error ;)
It looks like filterdiff did not work correctly when generating the
latest patch with context diffs, I cannot apply it cleanly wither.
This is perhaps due to a wrong manipulation from me. Please try the
attached that has been generated as a raw git output. It works
correctly with a git apply. I just checked.Did you check whether that introduces a performance regression?
/* ---------- + * toast_get_valid_index + * + * Get the valid index of given toast relation. A toast relation can only + * have one valid index at the same time. The lock taken on the index + * relations is released at the end of this function call. + */ +Oid +toast_get_valid_index(Oid toastoid, LOCKMODE lock) +{ + ListCell *lc; + List *indexlist; + int num_indexes, i = 0; + Oid validIndexOid; + Relation validIndexRel; + Relation *toastidxs; + Relation toastrel; + + /* Get the index list of relation */ + toastrel = heap_open(toastoid, lock); + indexlist = RelationGetIndexList(toastrel); + num_indexes = list_length(indexlist); + + /* Open all the index relations */ + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation)); + foreach(lc, indexlist) + toastidxs[i++] = index_open(lfirst_oid(lc), lock); + + /* Fetch valid toast index */ + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes); + validIndexOid = RelationGetRelid(validIndexRel); + + /* Close all the index relations */ + for (i = 0; i < num_indexes; i++) + index_close(toastidxs[i], lock); + pfree(toastidxs); + list_free(indexlist); + + heap_close(toastrel, lock); + return validIndexOid; +}Just to make sure, could you check we've found a valid index?
static bool -toastrel_valueid_exists(Relation toastrel, Oid valueid) +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode) { bool result = false; ScanKeyData toastkey; SysScanDesc toastscan; + int i = 0; + int num_indexes; + Relation *toastidxs; + Relation validtoastidx; + ListCell *lc; + List *indexlist; + + /* Ensure that the list of indexes of toast relation is computed */ + indexlist = RelationGetIndexList(toastrel); + num_indexes = list_length(indexlist); + + /* Open each index relation necessary */ + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation)); + foreach(lc, indexlist) + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode); + + /* Fetch a valid index relation */ + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);Those 10 lines are repeated multiple times, in different
functions. Maybe move them into toast_index_fetch_valid and rename that
to
Relation *
toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, size_t valididx);That way we also wouldn't fetch/copy the indexlist twice in some
functions.+ /* Clean up */ + for (i = 0; i < num_indexes; i++) + index_close(toastidxs[i], lockmode); + list_free(indexlist); + pfree(toastidxs);The indexlist could already be freed inside the function proposed
above...diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8294b29..2b777da 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode) errmsg("cannot move temporary tables of other sessions")));+ foreach(lc, reltoastidxids) + { + Oid toastidxid = lfirst_oid(lc); + if (OidIsValid(toastidxid)) + ATExecSetTableSpace(toastidxid, newTableSpace, lockmode); + }Copy & pasted OidIsValid(), shouldn't be necessary anymore.
Otherwise I think there's not really much left to be done. Fujii?
Yep, will check.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jun 24, 2013 at 11:06 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-24 09:57:24 -0400, Tom Lane wrote:
Andres Freund <andres@2ndquadrant.com> writes:
Otherwise I think there's not really much left to be done. Fujii?
Well, other than the fact that we've not got MVCC catalog scans yet.
That statement was only about about the patch dealing the removal of
reltoastidxid.
Partially my mistake. It is not that obvious just based on the name of
this thread, so I should have moved the review of this particular
patch to another thread.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Patch updated according to comments.
On Mon, Jun 24, 2013 at 7:39 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-24 07:46:34 +0900, Michael Paquier wrote:
On Mon, Jun 24, 2013 at 7:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Compile error ;)
It looks like filterdiff did not work correctly when generating the
latest patch with context diffs, I cannot apply it cleanly wither.
This is perhaps due to a wrong manipulation from me. Please try the
attached that has been generated as a raw git output. It works
correctly with a git apply. I just checked.Did you check whether that introduces a performance regression?
I don't notice any difference. Here are some results on one of my
boxes with a single client using your previous test case.
master:
tps = 1753.374740 (including connections establishing)
tps = 1753.505288 (excluding connections establishing)
master + patch:
tps = 1738.354976 (including connections establishing)
tps = 1738.482424 (excluding connections establishing)
/* ---------- + * toast_get_valid_index + * + * Get the valid index of given toast relation. A toast relation can only + * have one valid index at the same time. The lock taken on the index + * relations is released at the end of this function call. + */ +Oid +toast_get_valid_index(Oid toastoid, LOCKMODE lock) +{ + ListCell *lc; + List *indexlist; + int num_indexes, i = 0; + Oid validIndexOid; + Relation validIndexRel; + Relation *toastidxs; + Relation toastrel; + + /* Get the index list of relation */ + toastrel = heap_open(toastoid, lock); + indexlist = RelationGetIndexList(toastrel); + num_indexes = list_length(indexlist); + + /* Open all the index relations */ + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation)); + foreach(lc, indexlist) + toastidxs[i++] = index_open(lfirst_oid(lc), lock); + + /* Fetch valid toast index */ + validIndexRel = toast_index_fetch_valid(toastidxs, num_indexes); + validIndexOid = RelationGetRelid(validIndexRel); + + /* Close all the index relations */ + for (i = 0; i < num_indexes; i++) + index_close(toastidxs[i], lock); + pfree(toastidxs); + list_free(indexlist); + + heap_close(toastrel, lock); + return validIndexOid; +}Just to make sure, could you check we've found a valid index?
Added an elog(ERROR) if valid index is not found.
static bool -toastrel_valueid_exists(Relation toastrel, Oid valueid) +toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode) { bool result = false; ScanKeyData toastkey; SysScanDesc toastscan; + int i = 0; + int num_indexes; + Relation *toastidxs; + Relation validtoastidx; + ListCell *lc; + List *indexlist; + + /* Ensure that the list of indexes of toast relation is computed */ + indexlist = RelationGetIndexList(toastrel); + num_indexes = list_length(indexlist); + + /* Open each index relation necessary */ + toastidxs = (Relation *) palloc(num_indexes * sizeof(Relation)); + foreach(lc, indexlist) + toastidxs[i++] = index_open(lfirst_oid(lc), lockmode); + + /* Fetch a valid index relation */ + validtoastidx = toast_index_fetch_valid(toastidxs, num_indexes);Those 10 lines are repeated multiple times, in different
functions. Maybe move them into toast_index_fetch_valid and rename that
to
Relation *
toast_open_indexes(Relation toastrel, LOCKMODE mode, size_t *numindexes, size_t valididx);That way we also wouldn't fetch/copy the indexlist twice in some
functions.
Good suggestion, this makes the code cleaner. However I didn't use
exactly what you suggested:
static int toast_open_indexes(Relation toastrel,
LOCKMODE lock,
Relation **toastidxs,
int *num_indexes);
static void toast_close_indexes(Relation *toastidxs, int num_indexes,
LOCKMODE lock);
toast_open_indexes returns the position of valid index in the array of
toast indexes. This looked clearer to me when coding.
+ /* Clean up */ + for (i = 0; i < num_indexes; i++) + index_close(toastidxs[i], lockmode); + list_free(indexlist); + pfree(toastidxs);The indexlist could already be freed inside the function proposed
above...
Done.
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 8294b29..2b777da 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode) errmsg("cannot move temporary tables of other sessions")));+ foreach(lc, reltoastidxids) + { + Oid toastidxid = lfirst_oid(lc); + if (OidIsValid(toastidxid)) + ATExecSetTableSpace(toastidxid, newTableSpace, lockmode); + }Copy & pasted OidIsValid(), shouldn't be necessary anymore.
Yep, indeed. If there are no indexes list would be simply empty.
Thanks for your patience.
--
Michael
Attachments:
20130625_1_remove_reltoastidxid_v13.patchapplication/octet-stream; name=20130625_1_remove_reltoastidxid_v13.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..18daf1c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,19 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indisvalid "
+ " AND indrelid IN (SELECT reltoastrelid "
+ " FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u)",
+ InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index e638a8f..f3d1d9e 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..e5355ff 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,18 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static int toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes);
+static void toast_close_indexes(Relation *toastidxs, int num_indexes,
+ LOCKMODE lock);
/* ----------
@@ -1222,6 +1229,41 @@ toast_compress_datum(Datum value)
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+Oid
+toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+{
+ int num_indexes;
+ int validIndex;
+ Oid validIndexOid;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+
+ /* Look for the valid index of relation */
+ validIndex = toast_open_indexes(toastrel,
+ lock,
+ &toastidxs,
+ &num_indexes);
+ validIndexOid = RelationGetRelid(toastidxs[validIndex]);
+
+ /* Close all the index relations */
+ toast_close_indexes(toastidxs, num_indexes, lock);
+
+ /* Close toast relation */
+ heap_close(toastrel, lock);
+ return validIndexOid;
+}
+
+
+/* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
@@ -1238,7 +1280,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1299,23 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int num_indexes;
+ int validIndex;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid index used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1380,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
}
else
@@ -1367,7 +1417,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,8 +1435,8 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
- (AttrNumber) 1);
+ RelationGetRelid(toastidxs[validIndex]),
+ (AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
}
@@ -1405,6 +1456,8 @@ toast_save_datum(Relation rel, Datum value,
*/
while (data_todo > 0)
{
+ int i;
+
/*
* Calculate the size of this chunk
*/
@@ -1423,16 +1476,18 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
@@ -1447,9 +1502,9 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relations
*/
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
@@ -1475,10 +1530,12 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ int num_indexes;
+ int validIndex;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1544,15 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1567,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,7 +1581,7 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
@@ -1531,11 +1593,20 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int num_indexes;
+ int validIndex;
+ Relation *toastidxs;
+
+ /* Fetch a valid index relation */
+ validIndex = toast_open_indexes(toastrel,
+ lockmode,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,14 +1619,18 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
- true, SnapshotToast, 1, &toastkey);
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(toastidxs[validIndex]),
+ true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
+ /* Clean up */
+ toast_close_indexes(toastidxs, num_indexes, lockmode);
+
return result;
}
@@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
heap_close(toastrel, AccessShareLock);
@@ -1592,7 +1667,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1682,8 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ int num_indexes;
+ int validIndex;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1699,16 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Fetch relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1727,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,7 +1816,7 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
@@ -1751,7 +1833,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1856,8 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int validIndex;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1900,16 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Look for the valid index of toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1950,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2047,84 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
+
+/* ----------
+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsability of the caller of this
+ * function to close the index relations as well as free them.
+ */
+static int
+toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);
+ *num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ *toastidxs = (Relation *) palloc(*num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < *num_indexes; i++)
+ {
+ Relation toastidx = *toastidxs[i];
+ if (toastidx->rd_index->indisvalid)
+ {
+ res = i;
+ found = true;
+ break;
+ }
+ }
+
+ /*
+ * Free index list, not necessary as relations are opened and a valid index
+ * has been found.
+ */
+ list_free(indexlist);
+
+ /*
+ * The toast relation should have one valid index, so something is
+ * going wrong if there is nothing.
+ */
+ if (!found)
+ elog(ERROR, "no valid index found for toast relation with Oid %d",
+ RelationGetRelid(toastrel));
+
+ return res;
+}
+
+/* ----------
+ * toast_close_indexes
+ *
+ * Close an array of indexes for a toast relation and free it. This should
+ * be called for a set of index relations opened previously with
+ * toast_open_indexes.
+ */
+static void
+toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
+{
+ int i;
+
+ /* Close relations and clean up things */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..e08954e 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..e196a0c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..3c2a474 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..e4747a9 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,6 +21,7 @@
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+#include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
@@ -1172,8 +1173,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1393,18 +1392,31 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * valid index. The swap can actually be safely done only if the relations
+ * have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- is_internal,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Oid toastIndex1, toastIndex2;
+
+ /* Get valid index for each relation */
+ toastIndex1 = toast_get_valid_index(relform1->reltoastrelid,
+ AccessExclusiveLock);
+ toastIndex2 = toast_get_valid_index(relform2->reltoastrelid,
+ AccessExclusiveLock);
+
+ if (toastIndex1 && toastIndex2)
+ swap_relation_files(toastIndex1,
+ toastIndex2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ is_internal,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1528,14 +1540,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
- toastrel = relation_open(newrel->rd_rel->reltoastrelid,
- AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
- relation_close(toastrel, AccessShareLock);
+ /* Get the associated valid index to be renamed */
+ toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
+ AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1543,9 +1553,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
+ /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8294b29..f6df923 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8728,7 +8728,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8736,6 +8735,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8860,11 +8867,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
- /* Move associated toast relation and/or index, too */
+ /* Move associated toast relation and/or indexes, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..f721dbb 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..bdef79b 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,9 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +352,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ indexlist = RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
+ list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index ec956ad..ccd5663 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2781,16 +2781,16 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, t.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
- "WHERE c.oid = '%u'::pg_catalog.oid;",
+ "pg_catalog.pg_index t ON (c.reltoastrelid = t.indrelid) "
+ "WHERE c.oid = '%u'::pg_catalog.oid AND t.indisvalid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2816,7 +2816,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* every toast table has one valid index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastidxid);
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 6f4fc45..5890290 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -15,6 +15,7 @@
#include "access/htup_details.h"
#include "utils/relcache.h"
+#include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
@@ -188,4 +189,12 @@ extern Size toast_raw_datum_size(Datum value);
*/
extern Size toast_datum_size(Datum value);
+/* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..a6444a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
+ | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
On Tue, Jun 25, 2013 at 8:15 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Patch updated according to comments.
Thanks for updating the patch!
When I ran VACUUM FULL, I got the following error.
ERROR: attempt to apply a mapping to unmapped relation 16404
STATEMENT: vacuum full;
Could you let me clear why toast_save_datum needs to update even invalid toast
index? It's required only for REINDEX CONCURRENTLY?
@@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);
toastid_valueid_exists() is used only in toast_save_datum(). So we should use
RowExclusiveLock here rather than AccessShareLock?
+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsability of the caller of this
Typo: responsibility
toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);
What about adding the assertion which checks that the return value
of RelationGetIndexList() is not NIL?
When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
I got the following error. Without the patch, that succeeded.
command: "/dav/reindex/bin/pg_dump" --host "/dav/reindex" --port 50432
--username "postgres" --schema-only --quote-all-identifiers
--binary-upgrade --format=custom
--file="pg_upgrade_dump_12270.custom" "postgres" >>
"pg_upgrade_dump_12270.log" 2>&1
pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
'16390'::pg_catalog.oid AND t.indisvalid;
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jun 26, 2013 at 1:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Thanks for updating the patch!
And thanks for taking time to look at that. I updated the patch
according to your comments, except for the VACUUM FULL problem. Please
see patch attached and below for more details.
When I ran VACUUM FULL, I got the following error.
ERROR: attempt to apply a mapping to unmapped relation 16404
STATEMENT: vacuum full;
This can be reproduced when doing a vacuum full on pg_proc,
pg_shdescription or pg_db_role_setting for example, or relations that
have no relfilenode (mapped catalogs), and a toast relation. I still
have no idea what is happening here but I am looking at it. As this
patch removes reltoastidxid, could that removal have effect on the
relation mapping of mapped catalogs? Does someone have an idea?
Could you let me clear why toast_save_datum needs to update even invalid toast
index? It's required only for REINDEX CONCURRENTLY?
Because an invalid index might be marked as indisready, so ready to
receive inserts. Yes this is a requirement for REINDEX CONCURRENTLY,
and in a more general way a requirement for a relation that includes
in rd_indexlist indexes that are live, ready but not valid. Just based
on this remark I spotted a bug in my patch for tuptoaster.c where we
could insert a new index tuple entry in toast_save_datum for an index
live but not ready. Fixed that by adding an additional check to the
flag indisready before calling index_insert.
@@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid); + result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);toastid_valueid_exists() is used only in toast_save_datum(). So we should use
RowExclusiveLock here rather than AccessShareLock?
Makes sense.
+ * toast_open_indexes + * + * Get an array of index relations associated to the given toast relation + * and return as well the position of the valid index used by the toast + * relation in this array. It is the responsability of the caller of thisTypo: responsibility
Done.
toast_open_indexes(Relation toastrel, + LOCKMODE lock, + Relation **toastidxs, + int *num_indexes) +{ + int i = 0; + int res = 0; + bool found = false; + List *indexlist; + ListCell *lc; + + /* Get index list of relation */ + indexlist = RelationGetIndexList(toastrel);What about adding the assertion which checks that the return value
of RelationGetIndexList() is not NIL?
Done.
When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
I got the following error. Without the patch, that succeeded.command: "/dav/reindex/bin/pg_dump" --host "/dav/reindex" --port 50432
--username "postgres" --schema-only --quote-all-identifiers
--binary-upgrade --format=custom
--file="pg_upgrade_dump_12270.custom" "postgres" >>
"pg_upgrade_dump_12270.log" 2>&1
pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
'16390'::pg_catalog.oid AND t.indisvalid;
This issue is reproducible easily by having more than 1 table using
toast indexes in the cluster to be upgraded. The error was on pg_dump
side when trying to do a binary upgrade. In order to fix that, I
changed the code binary_upgrade_set_pg_class_oids:pg_dump.c to fetch
the index associated to a toast relation only if there is a toast
relation. This adds one extra step in the process for each having a
toast relation, but makes the code clearer. Note that I checked
pg_upgrade down to 8.4...
--
Michael
Attachments:
20130628_1_remove_reltoastidxid_v14.patchapplication/octet-stream; name=20130628_1_remove_reltoastidxid_v14.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..18daf1c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,19 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indisvalid "
+ " AND indrelid IN (SELECT reltoastrelid "
+ " FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u)",
+ InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 09f7e40..6715782 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..ecfe109 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,18 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static int toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes);
+static void toast_close_indexes(Relation *toastidxs, int num_indexes,
+ LOCKMODE lock);
/* ----------
@@ -1222,6 +1229,41 @@ toast_compress_datum(Datum value)
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+Oid
+toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+{
+ int num_indexes;
+ int validIndex;
+ Oid validIndexOid;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+
+ /* Look for the valid index of relation */
+ validIndex = toast_open_indexes(toastrel,
+ lock,
+ &toastidxs,
+ &num_indexes);
+ validIndexOid = RelationGetRelid(toastidxs[validIndex]);
+
+ /* Close all the index relations */
+ toast_close_indexes(toastidxs, num_indexes, lock);
+
+ /* Close toast relation */
+ heap_close(toastrel, lock);
+ return validIndexOid;
+}
+
+
+/* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
@@ -1238,7 +1280,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1299,23 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int num_indexes;
+ int validIndex;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid index used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1380,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
}
else
@@ -1367,7 +1417,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,8 +1435,8 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
- (AttrNumber) 1);
+ RelationGetRelid(toastidxs[validIndex]),
+ (AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
}
@@ -1405,6 +1456,8 @@ toast_save_datum(Relation rel, Datum value,
*/
while (data_todo > 0)
{
+ int i;
+
/*
* Calculate the size of this chunk
*/
@@ -1423,16 +1476,22 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ {
+ /* Only index relations marked as ready can updated */
+ if (toastidxs[i]->rd_index->indisready)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ }
/*
* Free memory
@@ -1447,9 +1506,9 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relations
*/
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
@@ -1475,10 +1534,12 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ int num_indexes;
+ int validIndex;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1548,15 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1571,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,7 +1585,7 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
@@ -1531,11 +1597,20 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int num_indexes;
+ int validIndex;
+ Relation *toastidxs;
+
+ /* Fetch a valid index relation */
+ validIndex = toast_open_indexes(toastrel,
+ lockmode,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,14 +1623,18 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
- true, SnapshotToast, 1, &toastkey);
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(toastidxs[validIndex]),
+ true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
+ /* Clean up */
+ toast_close_indexes(toastidxs, num_indexes, lockmode);
+
return result;
}
@@ -1573,7 +1652,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, RowExclusiveLock);
heap_close(toastrel, AccessShareLock);
@@ -1592,7 +1671,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1686,8 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ int num_indexes;
+ int validIndex;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1703,16 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Fetch relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1731,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,7 +1820,7 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
@@ -1751,7 +1837,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1860,8 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int validIndex;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1904,16 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Look for the valid index of toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1954,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2051,86 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
+
+/* ----------
+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsibility of the caller of this
+ * function to close the index relations as well as free them.
+ */
+static int
+toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);
+ Assert(indexlist != NIL);
+
+ *num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ *toastidxs = (Relation *) palloc(*num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < *num_indexes; i++)
+ {
+ Relation toastidx = *toastidxs[i];
+ if (toastidx->rd_index->indisvalid)
+ {
+ res = i;
+ found = true;
+ break;
+ }
+ }
+
+ /*
+ * Free index list, not necessary as relations are opened and a valid index
+ * has been found.
+ */
+ list_free(indexlist);
+
+ /*
+ * The toast relation should have one valid index, so something is
+ * going wrong if there is nothing.
+ */
+ if (!found)
+ elog(ERROR, "no valid index found for toast relation with Oid %d",
+ RelationGetRelid(toastrel));
+
+ return res;
+}
+
+/* ----------
+ * toast_close_indexes
+ *
+ * Close an array of indexes for a toast relation and free it. This should
+ * be called for a set of index relations opened previously with
+ * toast_open_indexes.
+ */
+static void
+toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
+{
+ int i;
+
+ /* Close relations and clean up things */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..e08954e 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..e196a0c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..3c2a474 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..e4747a9 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,6 +21,7 @@
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+#include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
@@ -1172,8 +1173,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1393,18 +1392,31 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * valid index. The swap can actually be safely done only if the relations
+ * have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
- target_is_pg_class,
- swap_toast_by_content,
- is_internal,
- InvalidTransactionId,
- InvalidMultiXactId,
- mapped_tables);
+ relform1->reltoastrelid &&
+ relform2->reltoastrelid)
+ {
+ Oid toastIndex1, toastIndex2;
+
+ /* Get valid index for each relation */
+ toastIndex1 = toast_get_valid_index(relform1->reltoastrelid,
+ AccessExclusiveLock);
+ toastIndex2 = toast_get_valid_index(relform2->reltoastrelid,
+ AccessExclusiveLock);
+
+ if (toastIndex1 && toastIndex2)
+ swap_relation_files(toastIndex1,
+ toastIndex2,
+ target_is_pg_class,
+ swap_toast_by_content,
+ is_internal,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1528,14 +1540,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
- toastrel = relation_open(newrel->rd_rel->reltoastrelid,
- AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
- relation_close(toastrel, AccessShareLock);
+ /* Get the associated valid index to be renamed */
+ toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
+ AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1543,9 +1553,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
+ /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8294b29..f6df923 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8728,7 +8728,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8736,6 +8735,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8782,7 +8783,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -8860,11 +8867,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
- /* Move associated toast relation and/or index, too */
+ /* Move associated toast relation and/or indexes, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..f721dbb 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..bdef79b 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,9 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +352,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ indexlist = RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
+ list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index becc82b..208409a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
/* only tables have toast tables, not indexes */
if (OidIsValid(pg_class_reltoastrelid))
{
+ PQExpBuffer index_query = createPQExpBuffer();
+ PGresult *index_res;
+ Oid indexrelid;
+
/*
* One complexity is that the table definition might not require
* the creation of a TOAST table, and the TOAST table might have
@@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* Every toast table has one valid index, so fetch it first... */
+ appendPQExpBuffer(index_query,
+ "SELECT c.indexrelid "
+ "FROM pg_catalog.pg_index c "
+ "WHERE c.indrelid = %u;",
+ pg_class_reltoastrelid);
+ index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data);
+ indexrelid = atooid(PQgetvalue(index_res, 0,
+ PQfnumber(index_res, "indexrelid")));
+
+ /* Then set it */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
+ indexrelid);
+
+ PQclear(index_res);
+ destroyPQExpBuffer(index_query);
}
}
else
@@ -13126,7 +13141,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
* attislocal correctly, plus fix up any inherited CHECK constraints.
* Analogously, we set up typed tables using ALTER TABLE / OF here.
*/
- if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
+ if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
tbinfo->relkind == RELKIND_FOREIGN_TABLE) )
{
for (j = 0; j < tbinfo->numatts; j++)
@@ -13151,7 +13166,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
else
appendPQExpBuffer(q, "ALTER FOREIGN TABLE %s ",
fmtId(tbinfo->dobj.name));
-
+
appendPQExpBuffer(q, "DROP COLUMN %s;\n",
fmtId(tbinfo->attnames[j]));
}
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 6f4fc45..5890290 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -15,6 +15,7 @@
#include "access/htup_details.h"
#include "utils/relcache.h"
+#include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
@@ -188,4 +189,12 @@ extern Size toast_raw_datum_size(Datum value);
*/
extern Size toast_datum_size(Datum value);
+/* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..a6444a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
+ | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
When I ran VACUUM FULL, I got the following error.
ERROR: attempt to apply a mapping to unmapped relation 16404
STATEMENT: vacuum full;This can be reproduced when doing a vacuum full on pg_proc,
pg_shdescription or pg_db_role_setting for example, or relations that
have no relfilenode (mapped catalogs), and a toast relation. I still
have no idea what is happening here but I am looking at it. As this
patch removes reltoastidxid, could that removal have effect on the
relation mapping of mapped catalogs? Does someone have an idea?
I'd guess you broke "swap_toast_by_content" case in cluster.c? We cannot
change the oid of a mapped relation (including indexes) since pg_class
in other databases wouldn't get the news.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 28, 2013 at 4:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
When I ran VACUUM FULL, I got the following error.
ERROR: attempt to apply a mapping to unmapped relation 16404
STATEMENT: vacuum full;This can be reproduced when doing a vacuum full on pg_proc,
pg_shdescription or pg_db_role_setting for example, or relations that
have no relfilenode (mapped catalogs), and a toast relation. I still
have no idea what is happening here but I am looking at it. As this
patch removes reltoastidxid, could that removal have effect on the
relation mapping of mapped catalogs? Does someone have an idea?I'd guess you broke "swap_toast_by_content" case in cluster.c? We cannot
change the oid of a mapped relation (including indexes) since pg_class
in other databases wouldn't get the news.
Yeah, I thought that something was broken in swap_relation_files, but
after comparing the code path taken by my code and master, and the
different function calls I can't find any difference. I'm assuming
that there is something wrong in tuptoaster.c in the fact of opening
toast index relations in order to get the Oids to be swapped... But so
far nothing I am just not sure...
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi all,
Please find attached an updated version of the patch removing
reltoastidxid (with and w/o context diffs), patch fixing the vacuum
full issue. With this fix, all the comments are addressed.
On Fri, Jun 28, 2013 at 5:07 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Jun 28, 2013 at 4:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-06-28 16:30:16 +0900, Michael Paquier wrote:
When I ran VACUUM FULL, I got the following error.
ERROR: attempt to apply a mapping to unmapped relation 16404
STATEMENT: vacuum full;This can be reproduced when doing a vacuum full on pg_proc,
pg_shdescription or pg_db_role_setting for example, or relations that
have no relfilenode (mapped catalogs), and a toast relation. I still
have no idea what is happening here but I am looking at it. As this
patch removes reltoastidxid, could that removal have effect on the
relation mapping of mapped catalogs? Does someone have an idea?I'd guess you broke "swap_toast_by_content" case in cluster.c? We cannot
change the oid of a mapped relation (including indexes) since pg_class
in other databases wouldn't get the news.Yeah, I thought that something was broken in swap_relation_files, but
after comparing the code path taken by my code and master, and the
different function calls I can't find any difference. I'm assuming
that there is something wrong in tuptoaster.c in the fact of opening
toast index relations in order to get the Oids to be swapped... But so
far nothing I am just not sure...
The error was indeed in swap_relation_files, when trying to swap toast
indexes. The code path doing the toast index swap was taken not for
toast relations but for their parent relations, creating weird
behavior for mapped catalogs at relation cache level it seems.
Regards,
--
Michael
Attachments:
20130701_1_remove_reltoastidxid_v15.patchapplication/octet-stream; name=20130701_1_remove_reltoastidxid_v15.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..18daf1c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,19 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indisvalid "
+ " AND indrelid IN (SELECT reltoastrelid "
+ " FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u)",
+ InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 09f7e40..6715782 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index fc37ceb..ecfe109 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -76,11 +76,18 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static int toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes);
+static void toast_close_indexes(Relation *toastidxs, int num_indexes,
+ LOCKMODE lock);
/* ----------
@@ -1222,6 +1229,41 @@ toast_compress_datum(Datum value)
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+Oid
+toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+{
+ int num_indexes;
+ int validIndex;
+ Oid validIndexOid;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+
+ /* Look for the valid index of relation */
+ validIndex = toast_open_indexes(toastrel,
+ lock,
+ &toastidxs,
+ &num_indexes);
+ validIndexOid = RelationGetRelid(toastidxs[validIndex]);
+
+ /* Close all the index relations */
+ toast_close_indexes(toastidxs, num_indexes, lock);
+
+ /* Close toast relation */
+ heap_close(toastrel, lock);
+ return validIndexOid;
+}
+
+
+/* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
@@ -1238,7 +1280,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1257,15 +1299,23 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int num_indexes;
+ int validIndex;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid index used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1330,7 +1380,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
}
else
@@ -1367,7 +1417,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1384,8 +1435,8 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
- (AttrNumber) 1);
+ RelationGetRelid(toastidxs[validIndex]),
+ (AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
}
@@ -1405,6 +1456,8 @@ toast_save_datum(Relation rel, Datum value,
*/
while (data_todo > 0)
{
+ int i;
+
/*
* Calculate the size of this chunk
*/
@@ -1423,16 +1476,22 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ {
+ /* Only index relations marked as ready can updated */
+ if (toastidxs[i]->rd_index->indisready)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ }
/*
* Free memory
@@ -1447,9 +1506,9 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relations
*/
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
@@ -1475,10 +1534,12 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ int num_indexes;
+ int validIndex;
if (!VARATT_IS_EXTERNAL(attr))
return;
@@ -1487,10 +1548,15 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1505,7 +1571,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1519,7 +1585,7 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
@@ -1531,11 +1597,20 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int num_indexes;
+ int validIndex;
+ Relation *toastidxs;
+
+ /* Fetch a valid index relation */
+ validIndex = toast_open_indexes(toastrel,
+ lockmode,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1548,14 +1623,18 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
- true, SnapshotToast, 1, &toastkey);
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(toastidxs[validIndex]),
+ true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
+ /* Clean up */
+ toast_close_indexes(toastidxs, num_indexes, lockmode);
+
return result;
}
@@ -1573,7 +1652,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, RowExclusiveLock);
heap_close(toastrel, AccessShareLock);
@@ -1592,7 +1671,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1607,6 +1686,8 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ int num_indexes;
+ int validIndex;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
@@ -1622,11 +1703,16 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Fetch relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1645,7 +1731,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1734,7 +1820,7 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
@@ -1751,7 +1837,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1774,6 +1860,8 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int validIndex;
Assert(VARATT_IS_EXTERNAL(attr));
@@ -1816,11 +1904,16 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Look for the valid index of toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1861,7 +1954,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1958,8 +2051,86 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
+
+/* ----------
+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsibility of the caller of this
+ * function to close the index relations as well as free them.
+ */
+static int
+toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);
+ Assert(indexlist != NIL);
+
+ *num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ *toastidxs = (Relation *) palloc(*num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < *num_indexes; i++)
+ {
+ Relation toastidx = *toastidxs[i];
+ if (toastidx->rd_index->indisvalid)
+ {
+ res = i;
+ found = true;
+ break;
+ }
+ }
+
+ /*
+ * Free index list, not necessary as relations are opened and a valid index
+ * has been found.
+ */
+ list_free(indexlist);
+
+ /*
+ * The toast relation should have one valid index, so something is
+ * going wrong if there is nothing.
+ */
+ if (!found)
+ elog(ERROR, "no valid index found for toast relation with Oid %d",
+ RelationGetRelid(toastrel));
+
+ return res;
+}
+
+/* ----------
+ * toast_close_indexes
+ *
+ * Close an array of indexes for a toast relation and free it. This should
+ * be called for a set of index relations opened previously with
+ * toast_open_indexes.
+ */
+static void
+toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
+{
+ int i;
+
+ /* Close relations and clean up things */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 45a84e4..e08954e 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5f61ecb..e196a0c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..3c2a474 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 095d5e4..9564fa2 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,6 +21,7 @@
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+#include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
@@ -1172,8 +1173,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1393,18 +1392,30 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * valid index. The swap can actually be safely done only if the relations
+ * have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
+ relform1->relkind == RELKIND_TOASTVALUE &&
+ relform2->relkind == RELKIND_TOASTVALUE)
+ {
+ Oid toastIndex1, toastIndex2;
+
+ /* Get valid index for each relation */
+ toastIndex1 = toast_get_valid_index(r1,
+ AccessExclusiveLock);
+ toastIndex2 = toast_get_valid_index(r2,
+ AccessExclusiveLock);
+
+ swap_relation_files(toastIndex1,
+ toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1528,14 +1539,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
- toastrel = relation_open(newrel->rd_rel->reltoastrelid,
- AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
- relation_close(toastrel, AccessShareLock);
+ /* Get the associated valid index to be renamed */
+ toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
+ AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1543,9 +1552,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
+ /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ea1c309..a77c402 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8869,7 +8869,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8877,6 +8876,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8923,7 +8924,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -9001,11 +9008,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
- /* Move associated toast relation and/or index, too */
+ /* Move associated toast relation and/or indexes, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index fb57621..f721dbb 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -575,8 +575,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -588,7 +588,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 4c4e1ed..bdef79b 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,9 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +352,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ indexlist = RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
+ list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index becc82b..208409a 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
/* only tables have toast tables, not indexes */
if (OidIsValid(pg_class_reltoastrelid))
{
+ PQExpBuffer index_query = createPQExpBuffer();
+ PGresult *index_res;
+ Oid indexrelid;
+
/*
* One complexity is that the table definition might not require
* the creation of a TOAST table, and the TOAST table might have
@@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* Every toast table has one valid index, so fetch it first... */
+ appendPQExpBuffer(index_query,
+ "SELECT c.indexrelid "
+ "FROM pg_catalog.pg_index c "
+ "WHERE c.indrelid = %u;",
+ pg_class_reltoastrelid);
+ index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data);
+ indexrelid = atooid(PQgetvalue(index_res, 0,
+ PQfnumber(index_res, "indexrelid")));
+
+ /* Then set it */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
+ indexrelid);
+
+ PQclear(index_res);
+ destroyPQExpBuffer(index_query);
}
}
else
@@ -13126,7 +13141,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
* attislocal correctly, plus fix up any inherited CHECK constraints.
* Analogously, we set up typed tables using ALTER TABLE / OF here.
*/
- if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
+ if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
tbinfo->relkind == RELKIND_FOREIGN_TABLE) )
{
for (j = 0; j < tbinfo->numatts; j++)
@@ -13151,7 +13166,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
else
appendPQExpBuffer(q, "ALTER FOREIGN TABLE %s ",
fmtId(tbinfo->dobj.name));
-
+
appendPQExpBuffer(q, "DROP COLUMN %s;\n",
fmtId(tbinfo->attnames[j]));
}
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index 6f4fc45..5890290 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -15,6 +15,7 @@
#include "access/htup_details.h"
#include "utils/relcache.h"
+#include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
@@ -188,4 +189,12 @@ extern Size toast_raw_datum_size(Datum value);
*/
extern Size toast_datum_size(Datum value);
+/* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..a6444a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
+ | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
20130701_1_remove_reltoastidxid_v15_context.patchapplication/octet-stream; name=20130701_1_remove_reltoastidxid_v15_context.patchDownload
*** a/contrib/pg_upgrade/info.c
--- b/contrib/pg_upgrade/info.c
***************
*** 321,332 **** get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT reltoastidxid "
! "FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid"));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
--- 321,339 ----
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
! "SELECT indexrelid "
! "FROM pg_index "
! "WHERE indisvalid "
! " AND indrelid IN (SELECT reltoastrelid "
! " FROM info_rels i "
! " JOIN pg_catalog.pg_class c "
! " ON i.reloid = c.oid "
! " AND c.reltoastrelid != %u)",
! InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
***************
*** 1745,1759 ****
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
--- 1745,1750 ----
*** a/doc/src/sgml/diskusage.sgml
--- b/doc/src/sgml/diskusage.sgml
***************
*** 20,31 ****
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one index on the
! <acronym>TOAST</> table, if present. There also might be indexes associated
! with the base table. Each table and index is stored in a separate disk
! file — possibly more than one file, if the file would exceed one
! gigabyte. Naming conventions for these files are described in <xref
! linkend="storage-file-layout">.
</para>
<para>
--- 20,31 ----
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
! table (see <xref linkend="storage-toast">). There will be one valid index
! on the <acronym>TOAST</> table, if present. There also might be indexes
! associated with the base table. Each table and index is stored in a
! separate disk file — possibly more than one file, if the file would
! exceed one gigabyte. Naming conventions for these files are described
! in <xref linkend="storage-file-layout">.
</para>
<para>
***************
*** 44,50 ****
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
--- 44,50 ----
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
! pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
***************
*** 65,76 **** FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT reltoastidxid
! FROM pg_class
! WHERE oid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
--- 65,76 ----
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
! oid = (SELECT indexrelid
! FROM pg_index
! WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
! relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
***************
*** 87,93 **** WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
--- 87,93 ----
c2.oid = i.indexrelid
ORDER BY c2.relname;
! relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
***************
*** 101,107 **** SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
--- 101,107 ----
FROM pg_class
ORDER BY relpages DESC;
! relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***************
*** 1163,1174 **** postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
</row>
</tbody>
</tgroup>
--- 1163,1174 ----
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
! <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
! <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
*** a/src/backend/access/heap/tuptoaster.c
--- b/src/backend/access/heap/tuptoaster.c
***************
*** 76,86 **** do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
/* ----------
--- 76,93 ----
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
! static bool toastrel_valueid_exists(Relation toastrel,
! Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+ static int toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes);
+ static void toast_close_indexes(Relation *toastidxs, int num_indexes,
+ LOCKMODE lock);
/* ----------
***************
*** 1222,1227 **** toast_compress_datum(Datum value)
--- 1229,1269 ----
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+ Oid
+ toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+ {
+ int num_indexes;
+ int validIndex;
+ Oid validIndexOid;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+
+ /* Look for the valid index of relation */
+ validIndex = toast_open_indexes(toastrel,
+ lock,
+ &toastidxs,
+ &num_indexes);
+ validIndexOid = RelationGetRelid(toastidxs[validIndex]);
+
+ /* Close all the index relations */
+ toast_close_indexes(toastidxs, num_indexes, lock);
+
+ /* Close toast relation */
+ heap_close(toastrel, lock);
+ return validIndexOid;
+ }
+
+
+ /* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
***************
*** 1238,1244 **** toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
! Relation toastidx;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
--- 1280,1286 ----
struct varlena * oldexternal, int options)
{
Relation toastrel;
! Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
***************
*** 1257,1271 **** toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
--- 1299,1321 ----
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int num_indexes;
+ int validIndex;
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
! * additional columns besides OID. A toast table can have multiple identical
! * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
!
! /* Fetch valid index used for process */
! validIndex = toast_open_indexes(toastrel,
! RowExclusiveLock,
! &toastidxs,
! &num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
***************
*** 1330,1336 **** toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
(AttrNumber) 1);
}
else
--- 1380,1386 ----
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
}
else
***************
*** 1367,1373 **** toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
--- 1417,1424 ----
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
! toast_pointer.va_valueid,
! RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
***************
*** 1384,1391 **** toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidx),
! (AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
}
--- 1435,1442 ----
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
! RelationGetRelid(toastidxs[validIndex]),
! (AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
}
***************
*** 1405,1410 **** toast_save_datum(Relation rel, Datum value,
--- 1456,1463 ----
*/
while (data_todo > 0)
{
+ int i;
+
/*
* Calculate the size of this chunk
*/
***************
*** 1423,1438 **** toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! index_insert(toastidx, t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidx->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
/*
* Free memory
--- 1476,1497 ----
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
! * are the same as the initial columns of the table for all the
! * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
! for (i = 0; i < num_indexes; i++)
! {
! /* Only index relations marked as ready can updated */
! if (toastidxs[i]->rd_index->indisready)
! index_insert(toastidxs[i], t_values, t_isnull,
! &(toasttup->t_self),
! toastrel,
! toastidxs[i]->rd_index->indisunique ?
! UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
! }
/*
* Free memory
***************
*** 1447,1455 **** toast_save_datum(Relation rel, Datum value,
}
/*
! * Done - close toast relation
*/
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
--- 1506,1514 ----
}
/*
! * Done - close toast relations
*/
! toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
***************
*** 1475,1484 **** toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
if (!VARATT_IS_EXTERNAL(attr))
return;
--- 1534,1545 ----
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ int num_indexes;
+ int validIndex;
if (!VARATT_IS_EXTERNAL(attr))
return;
***************
*** 1487,1496 **** toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1548,1562 ----
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
!
! /* Fetch valid relation used for process */
! validIndex = toast_open_indexes(toastrel,
! RowExclusiveLock,
! &toastidxs,
! &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1505,1511 **** toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1571,1577 ----
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
! toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1519,1525 **** toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
--- 1585,1591 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
***************
*** 1531,1541 **** toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
/*
* Setup a scan key to find chunks with matching va_valueid
--- 1597,1616 ----
* ----------
*/
static bool
! toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int num_indexes;
+ int validIndex;
+ Relation *toastidxs;
+
+ /* Fetch a valid index relation */
+ validIndex = toast_open_indexes(toastrel,
+ lockmode,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
***************
*** 1548,1561 **** toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
! true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
return result;
}
--- 1623,1640 ----
/*
* Is there any such chunk?
*/
! toastscan = systable_beginscan(toastrel,
! RelationGetRelid(toastidxs[validIndex]),
! true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
+ /* Clean up */
+ toast_close_indexes(toastidxs, num_indexes, lockmode);
+
return result;
}
***************
*** 1573,1579 **** toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid);
heap_close(toastrel, AccessShareLock);
--- 1652,1658 ----
toastrel = heap_open(toastrelid, AccessShareLock);
! result = toastrel_valueid_exists(toastrel, valueid, RowExclusiveLock);
heap_close(toastrel, AccessShareLock);
***************
*** 1592,1598 **** static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
--- 1671,1677 ----
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
! Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
***************
*** 1607,1612 **** toast_fetch_datum(struct varlena * attr)
--- 1686,1693 ----
bool isnull;
char *chunkdata;
int32 chunksize;
+ int num_indexes;
+ int validIndex;
/* Must copy to access aligned fields */
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
***************
*** 1622,1632 **** toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index by va_valueid
--- 1703,1718 ----
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
!
! /* Fetch relation used for process */
! validIndex = toast_open_indexes(toastrel,
! AccessShareLock,
! &toastidxs,
! &num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
***************
*** 1645,1651 **** toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1731,1737 ----
*/
nextidx = 0;
! toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1734,1740 **** toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
--- 1820,1826 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
***************
*** 1751,1757 **** static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
! Relation toastidx;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
--- 1837,1843 ----
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
! Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
***************
*** 1774,1779 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
--- 1860,1867 ----
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int validIndex;
Assert(VARATT_IS_EXTERNAL(attr));
***************
*** 1816,1826 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its index
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
! toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
/*
* Setup a scan key to fetch from the index. This is either two keys or
--- 1904,1919 ----
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
! * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
!
! /* Look for the valid index of toast relation */
! validIndex = toast_open_indexes(toastrel,
! AccessShareLock,
! &toastidxs,
! &num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
***************
*** 1861,1867 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, toastidx,
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
--- 1954,1960 ----
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
! toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
***************
*** 1958,1965 **** toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! index_close(toastidx, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
--- 2051,2136 ----
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
! toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
+
+ /* ----------
+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsibility of the caller of this
+ * function to close the index relations as well as free them.
+ */
+ static int
+ toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+ {
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);
+ Assert(indexlist != NIL);
+
+ *num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ *toastidxs = (Relation *) palloc(*num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < *num_indexes; i++)
+ {
+ Relation toastidx = *toastidxs[i];
+ if (toastidx->rd_index->indisvalid)
+ {
+ res = i;
+ found = true;
+ break;
+ }
+ }
+
+ /*
+ * Free index list, not necessary as relations are opened and a valid index
+ * has been found.
+ */
+ list_free(indexlist);
+
+ /*
+ * The toast relation should have one valid index, so something is
+ * going wrong if there is nothing.
+ */
+ if (!found)
+ elog(ERROR, "no valid index found for toast relation with Oid %d",
+ RelationGetRelid(toastrel));
+
+ return res;
+ }
+
+ /* ----------
+ * toast_close_indexes
+ *
+ * Close an array of indexes for a toast relation and free it. This should
+ * be called for a set of index relations opened previously with
+ * toast_open_indexes.
+ */
+ static void
+ toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
+ {
+ int i;
+
+ /* Close relations and clean up things */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+ }
*** a/src/backend/catalog/heap.c
--- b/src/backend/catalog/heap.c
***************
*** 781,787 **** InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
--- 781,786 ----
*** a/src/backend/catalog/index.c
--- b/src/backend/catalog/index.c
***************
*** 103,109 **** static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
--- 103,109 ----
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
! double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
***************
*** 1072,1078 **** index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
--- 1072,1077 ----
***************
*** 1254,1260 **** index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
--- 1253,1258 ----
***************
*** 1764,1771 **** FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
--- 1762,1767 ----
***************
*** 1781,1788 **** FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
! bool hasindex, bool isprimary,
! Oid reltoastidxid, double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
--- 1777,1785 ----
*/
static void
index_update_stats(Relation rel,
! bool hasindex,
! bool isprimary,
! double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
***************
*** 1876,1890 **** index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
--- 1873,1878 ----
***************
*** 2072,2085 **** index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
--- 2060,2070 ----
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***************
*** 473,488 **** CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! pg_stat_get_blocks_fetched(X.oid) -
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
! pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_class X ON T.reltoastidxid = X.oid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
--- 473,488 ----
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
! sum(pg_stat_get_blocks_fetched(X.indexrelid) -
! pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
! sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
! pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
! GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***************
*** 21,26 ****
--- 21,27 ----
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+ #include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
***************
*** 1172,1179 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
--- 1173,1178 ----
***************
*** 1393,1410 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
! * indexes.
*/
if (swap_toast_by_content &&
! relform1->reltoastidxid && relform2->reltoastidxid)
! swap_relation_files(relform1->reltoastidxid,
! relform2->reltoastidxid,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
/* Clean up. */
heap_freetuple(reltup1);
--- 1392,1421 ----
/*
* If we're swapping two toast tables by content, do the same for their
! * valid index. The swap can actually be safely done only if the relations
! * have indexes.
*/
if (swap_toast_by_content &&
! relform1->relkind == RELKIND_TOASTVALUE &&
! relform2->relkind == RELKIND_TOASTVALUE)
! {
! Oid toastIndex1, toastIndex2;
!
! /* Get valid index for each relation */
! toastIndex1 = toast_get_valid_index(r1,
! AccessExclusiveLock);
! toastIndex2 = toast_get_valid_index(r2,
! AccessExclusiveLock);
!
! swap_relation_files(toastIndex1,
! toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
***************
*** 1528,1541 **** finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
! toastrel = relation_open(newrel->rd_rel->reltoastrelid,
! AccessShareLock);
! toastidx = toastrel->rd_rel->reltoastidxid;
! relation_close(toastrel, AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
--- 1539,1550 ----
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
Oid toastidx;
char NewToastName[NAMEDATALEN];
! /* Get the associated valid index to be renamed */
! toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
! AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
***************
*** 1543,1551 **** finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
! /* ... and its index too */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
RenameRelationInternal(toastidx,
NewToastName, true);
}
--- 1552,1561 ----
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
! /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
*** a/src/backend/commands/tablecmds.c
--- b/src/backend/commands/tablecmds.c
***************
*** 8869,8875 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
--- 8869,8874 ----
***************
*** 8877,8882 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
--- 8876,8883 ----
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
***************
*** 8923,8929 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
! reltoastidxid = rel->rd_rel->reltoastidxid;
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
--- 8924,8936 ----
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
! /* Fetch the list of indexes on toast relation if necessary */
! if (OidIsValid(reltoastrelid))
! {
! Relation toastRel = relation_open(reltoastrelid, lockmode);
! reltoastidxids = RelationGetIndexList(toastRel);
! relation_close(toastRel, lockmode);
! }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
***************
*** 9001,9011 **** ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
! /* Move associated toast relation and/or index, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
! if (OidIsValid(reltoastidxid))
! ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
}
/*
--- 9008,9021 ----
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
! /* Move associated toast relation and/or indexes, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
! foreach(lc, reltoastidxids)
! ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
!
! /* Clean up */
! list_free(reltoastidxids);
}
/*
*** a/src/backend/rewrite/rewriteDefine.c
--- b/src/backend/rewrite/rewriteDefine.c
***************
*** 575,582 **** DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid/reltoastidxid of
! * the toast table we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
--- 575,582 ----
/*
* Fix pg_class entry to look like a normal view's, including setting
! * the correct relkind and removal of reltoastrelid of the toast table
! * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
***************
*** 588,594 **** DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
--- 588,593 ----
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***************
*** 332,338 **** pg_relation_size(PG_FUNCTION_ARGS)
}
/*
! * Calculate total on-disk size of a TOAST relation, including its index.
* Must not be applied to non-TOAST relations.
*/
static int64
--- 332,338 ----
}
/*
! * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
***************
*** 340,347 **** calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
toastRel = relation_open(toastrelid, AccessShareLock);
--- 340,348 ----
{
int64 size = 0;
Relation toastRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
***************
*** 351,362 **** calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
! relation_close(toastIdxRel, AccessShareLock);
relation_close(toastRel, AccessShareLock);
return size;
--- 352,372 ----
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
! indexlist = RelationGetIndexList(toastRel);
! /* Size is calculated using all the indexes available */
! foreach(lc, indexlist)
! {
! Relation toastIdxRel;
! toastIdxRel = relation_open(lfirst_oid(lc),
! AccessShareLock);
! for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
! size += calculate_relation_size(&(toastIdxRel->rd_node),
! toastIdxRel->rd_backend, forkNum);
!
! relation_close(toastIdxRel, AccessShareLock);
! }
! list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
***************
*** 2778,2787 **** binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid, t.reltoastidxid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
--- 2778,2786 ----
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
appendPQExpBuffer(upgrade_query,
! "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
***************
*** 2790,2796 **** binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
--- 2789,2794 ----
***************
*** 2803,2808 **** binary_upgrade_set_pg_class_oids(Archive *fout,
--- 2801,2810 ----
/* only tables have toast tables, not indexes */
if (OidIsValid(pg_class_reltoastrelid))
{
+ PQExpBuffer index_query = createPQExpBuffer();
+ PGresult *index_res;
+ Oid indexrelid;
+
/*
* One complexity is that the table definition might not require
* the creation of a TOAST table, and the TOAST table might have
***************
*** 2816,2825 **** binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
! pg_class_reltoastidxid);
}
}
else
--- 2818,2840 ----
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
! /* Every toast table has one valid index, so fetch it first... */
! appendPQExpBuffer(index_query,
! "SELECT c.indexrelid "
! "FROM pg_catalog.pg_index c "
! "WHERE c.indrelid = %u;",
! pg_class_reltoastrelid);
! index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data);
! indexrelid = atooid(PQgetvalue(index_res, 0,
! PQfnumber(index_res, "indexrelid")));
!
! /* Then set it */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
! indexrelid);
!
! PQclear(index_res);
! destroyPQExpBuffer(index_query);
}
}
else
***************
*** 13126,13132 **** dumpTableSchema(Archive *fout, TableInfo *tbinfo)
* attislocal correctly, plus fix up any inherited CHECK constraints.
* Analogously, we set up typed tables using ALTER TABLE / OF here.
*/
! if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
tbinfo->relkind == RELKIND_FOREIGN_TABLE) )
{
for (j = 0; j < tbinfo->numatts; j++)
--- 13141,13147 ----
* attislocal correctly, plus fix up any inherited CHECK constraints.
* Analogously, we set up typed tables using ALTER TABLE / OF here.
*/
! if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
tbinfo->relkind == RELKIND_FOREIGN_TABLE) )
{
for (j = 0; j < tbinfo->numatts; j++)
***************
*** 13151,13157 **** dumpTableSchema(Archive *fout, TableInfo *tbinfo)
else
appendPQExpBuffer(q, "ALTER FOREIGN TABLE %s ",
fmtId(tbinfo->dobj.name));
!
appendPQExpBuffer(q, "DROP COLUMN %s;\n",
fmtId(tbinfo->attnames[j]));
}
--- 13166,13172 ----
else
appendPQExpBuffer(q, "ALTER FOREIGN TABLE %s ",
fmtId(tbinfo->dobj.name));
!
appendPQExpBuffer(q, "DROP COLUMN %s;\n",
fmtId(tbinfo->attnames[j]));
}
*** a/src/include/access/tuptoaster.h
--- b/src/include/access/tuptoaster.h
***************
*** 15,20 ****
--- 15,21 ----
#include "access/htup_details.h"
#include "utils/relcache.h"
+ #include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
***************
*** 188,191 **** extern Size toast_raw_datum_size(Datum value);
--- 189,200 ----
*/
extern Size toast_datum_size(Datum value);
+ /* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+ extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
*** a/src/include/catalog/pg_class.h
--- b/src/include/catalog/pg_class.h
***************
*** 48,54 **** CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
--- 48,53 ----
***************
*** 94,100 **** typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
! #define Natts_pg_class 29
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
--- 93,99 ----
* ----------------
*/
! #define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
***************
*** 107,129 **** typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
! #define Anum_pg_class_reltoastidxid 13
! #define Anum_pg_class_relhasindex 14
! #define Anum_pg_class_relisshared 15
! #define Anum_pg_class_relpersistence 16
! #define Anum_pg_class_relkind 17
! #define Anum_pg_class_relnatts 18
! #define Anum_pg_class_relchecks 19
! #define Anum_pg_class_relhasoids 20
! #define Anum_pg_class_relhaspkey 21
! #define Anum_pg_class_relhasrules 22
! #define Anum_pg_class_relhastriggers 23
! #define Anum_pg_class_relhassubclass 24
! #define Anum_pg_class_relispopulated 25
! #define Anum_pg_class_relfrozenxid 26
! #define Anum_pg_class_relminmxid 27
! #define Anum_pg_class_relacl 28
! #define Anum_pg_class_reloptions 29
/* ----------------
* initial contents of pg_class
--- 106,127 ----
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
! #define Anum_pg_class_relhasindex 13
! #define Anum_pg_class_relisshared 14
! #define Anum_pg_class_relpersistence 15
! #define Anum_pg_class_relkind 16
! #define Anum_pg_class_relnatts 17
! #define Anum_pg_class_relchecks 18
! #define Anum_pg_class_relhasoids 19
! #define Anum_pg_class_relhaspkey 20
! #define Anum_pg_class_relhasrules 21
! #define Anum_pg_class_relhastriggers 22
! #define Anum_pg_class_relhassubclass 23
! #define Anum_pg_class_relispopulated 24
! #define Anum_pg_class_relfrozenxid 25
! #define Anum_pg_class_relminmxid 26
! #define Anum_pg_class_relacl 27
! #define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
***************
*** 138,150 **** typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
! DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
--- 136,148 ----
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
! DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
! DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
*** a/src/test/regress/expected/oidjoins.out
--- b/src/test/regress/expected/oidjoins.out
***************
*** 353,366 **** WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
- ------+---------------
- (0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 353,358 ----
*** a/src/test/regress/expected/rules.out
--- b/src/test/regress/expected/rules.out
***************
*** 1852,1866 **** SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
! | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
--- 1852,1866 ----
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
! | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
! | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
! | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
! | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
***************
*** 2347,2357 **** select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | reltoastidxid | relkind | relfrozenxid
! ---------------+---------------+---------+--------------
! 0 | 0 | v | 0
(1 row)
drop view fooview;
--- 2347,2357 ----
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
! reltoastrelid | relkind | relfrozenxid
! ---------------+---------+--------------
! 0 | v | 0
(1 row)
drop view fooview;
*** a/src/test/regress/sql/oidjoins.sql
--- b/src/test/regress/sql/oidjoins.sql
***************
*** 177,186 **** SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
- SELECT ctid, reltoastidxid
- FROM pg_catalog.pg_class fk
- WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
--- 177,182 ----
*** a/src/test/regress/sql/rules.sql
--- b/src/test/regress/sql/rules.sql
***************
*** 872,878 **** create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, reltoastidxid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
--- 872,878 ----
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
! select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
*** a/src/tools/findoidjoins/README
--- b/src/tools/findoidjoins/README
***************
*** 86,92 **** Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
- Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
--- 86,91 ----
On Mon, Jul 1, 2013 at 9:31 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Hi all,
Please find attached an updated version of the patch removing
reltoastidxid (with and w/o context diffs), patch fixing the vacuum
full issue. With this fix, all the comments are addressed.
Thanks for updating the patch!
I have one question related to VACUUM FULL problem. What happens
if we run VACUUM FULL when there is an invalid toast index? The invalid
toast index is rebuilt and marked as valid, i.e., there can be multiple valid
toast indexes?
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jul 2, 2013 at 7:36 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Mon, Jul 1, 2013 at 9:31 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:Hi all,
Please find attached an updated version of the patch removing
reltoastidxid (with and w/o context diffs), patch fixing the vacuum
full issue. With this fix, all the comments are addressed.Thanks for updating the patch!
I have one question related to VACUUM FULL problem. What happens
if we run VACUUM FULL when there is an invalid toast index? The invalid
toast index is rebuilt and marked as valid, i.e., there can be multiple valid
toast indexes?
The invalid toast indexes are not rebuilt. With the design of this
patch, toast relations can only have one valid index at the same time,
and this is also the path taken by REINDEX CONCURRENTLY for toast
relations. This process is managed by this code in cluster.c, only the
valid index of toast relation is taken into account when rebuilding
relations:
***************
*** 1393,1410 **** swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
! * indexes.
*/
if (swap_toast_by_content &&
! relform1->reltoastidxid && relform2->reltoastidxid)
! swap_relation_files(relform1->reltoastidxid,
! relform2->reltoastidxid,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
/* Clean up. */
heap_freetuple(reltup1);
--- 1392,1421 ----
/*
* If we're swapping two toast tables by content, do the same for their
! * valid index. The swap can actually be safely done only if
the relations
! * have indexes.
*/
if (swap_toast_by_content &&
! relform1->relkind == RELKIND_TOASTVALUE &&
! relform2->relkind == RELKIND_TOASTVALUE)
! {
! Oid toastIndex1, toastIndex2;
!
! /* Get valid index for each relation */
! toastIndex1 = toast_get_valid_index(r1,
!
AccessExclusiveLock);
! toastIndex2 = toast_get_valid_index(r2,
!
AccessExclusiveLock);
!
! swap_relation_files(toastIndex1,
! toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+ }
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jun 28, 2013 at 4:30 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Jun 26, 2013 at 1:06 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Thanks for updating the patch!
And thanks for taking time to look at that. I updated the patch
according to your comments, except for the VACUUM FULL problem. Please
see patch attached and below for more details.When I ran VACUUM FULL, I got the following error.
ERROR: attempt to apply a mapping to unmapped relation 16404
STATEMENT: vacuum full;This can be reproduced when doing a vacuum full on pg_proc,
pg_shdescription or pg_db_role_setting for example, or relations that
have no relfilenode (mapped catalogs), and a toast relation. I still
have no idea what is happening here but I am looking at it. As this
patch removes reltoastidxid, could that removal have effect on the
relation mapping of mapped catalogs? Does someone have an idea?Could you let me clear why toast_save_datum needs to update even invalid toast
index? It's required only for REINDEX CONCURRENTLY?Because an invalid index might be marked as indisready, so ready to
receive inserts. Yes this is a requirement for REINDEX CONCURRENTLY,
and in a more general way a requirement for a relation that includes
in rd_indexlist indexes that are live, ready but not valid. Just based
on this remark I spotted a bug in my patch for tuptoaster.c where we
could insert a new index tuple entry in toast_save_datum for an index
live but not ready. Fixed that by adding an additional check to the
flag indisready before calling index_insert.@@ -1573,7 +1648,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid); + result = toastrel_valueid_exists(toastrel, valueid, AccessShareLock);toastid_valueid_exists() is used only in toast_save_datum(). So we should use
RowExclusiveLock here rather than AccessShareLock?Makes sense.
+ * toast_open_indexes + * + * Get an array of index relations associated to the given toast relation + * and return as well the position of the valid index used by the toast + * relation in this array. It is the responsability of the caller of thisTypo: responsibility
Done.
toast_open_indexes(Relation toastrel, + LOCKMODE lock, + Relation **toastidxs, + int *num_indexes) +{ + int i = 0; + int res = 0; + bool found = false; + List *indexlist; + ListCell *lc; + + /* Get index list of relation */ + indexlist = RelationGetIndexList(toastrel);What about adding the assertion which checks that the return value
of RelationGetIndexList() is not NIL?Done.
When I ran pg_upgrade for the upgrade from 9.2 to HEAD (with patch),
I got the following error. Without the patch, that succeeded.command: "/dav/reindex/bin/pg_dump" --host "/dav/reindex" --port 50432
--username "postgres" --schema-only --quote-all-identifiers
--binary-upgrade --format=custom
--file="pg_upgrade_dump_12270.custom" "postgres" >>
"pg_upgrade_dump_12270.log" 2>&1
pg_dump: query returned 0 rows instead of one: SELECT c.reltoastrelid,
t.indexrelid FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_index
t ON (c.reltoastrelid = t.indrelid) WHERE c.oid =
'16390'::pg_catalog.oid AND t.indisvalid;This issue is reproducible easily by having more than 1 table using
toast indexes in the cluster to be upgraded. The error was on pg_dump
side when trying to do a binary upgrade. In order to fix that, I
changed the code binary_upgrade_set_pg_class_oids:pg_dump.c to fetch
the index associated to a toast relation only if there is a toast
relation. This adds one extra step in the process for each having a
toast relation, but makes the code clearer. Note that I checked
pg_upgrade down to 8.4...
Why did you remove the check of indisvalid from the --binary-upgrade SQL?
Without this check, if there is the invalid toast index, more than one rows are
returned and ExecuteSqlQueryForSingleRow() would cause the error.
+ foreach(lc, indexlist)
+ *toastidxs[i++] = index_open(lfirst_oid(lc), lock);
*toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
happen.
For now I've not found any other big problem except the above.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 3, 2013 at 5:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Why did you remove the check of indisvalid from the --binary-upgrade SQL?
Without this check, if there is the invalid toast index, more than one rows are
returned and ExecuteSqlQueryForSingleRow() would cause the error.+ foreach(lc, indexlist) + *toastidxs[i++] = index_open(lfirst_oid(lc), lock);*toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
happen.For now I've not found any other big problem except the above.
OK cool, updated version attached. If you guys think that the attached
version is fine (only the reltoasyidxid removal part), perhaps it
would be worth committing it as Robert also committed the MVCC catalog
patch today. So we would be able to focus on the core feature asap
with the 2nd patch, and the removal of AccessExclusiveLock at swap
step.
Regards,
--
Michael
Attachments:
20130704_1_remove_reltoastidxid_v16.patchapplication/octet-stream; name=20130704_1_remove_reltoastidxid_v16.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..18daf1c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,19 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indisvalid "
+ " AND indrelid IN (SELECT reltoastrelid "
+ " FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u)",
+ InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 09f7e40..6715782 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 445a7ed..6b59301 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -73,11 +73,18 @@ do { \
static void toast_delete_datum(Relation rel, Datum value);
static Datum toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options);
-static bool toastrel_valueid_exists(Relation toastrel, Oid valueid);
+static bool toastrel_valueid_exists(Relation toastrel,
+ Oid valueid, LOCKMODE lockmode);
static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static int toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes);
+static void toast_close_indexes(Relation *toastidxs, int num_indexes,
+ LOCKMODE lock);
/* ----------
@@ -1287,6 +1294,41 @@ toast_compress_datum(Datum value)
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+Oid
+toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+{
+ int num_indexes;
+ int validIndex;
+ Oid validIndexOid;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+
+ /* Look for the valid index of relation */
+ validIndex = toast_open_indexes(toastrel,
+ lock,
+ &toastidxs,
+ &num_indexes);
+ validIndexOid = RelationGetRelid(toastidxs[validIndex]);
+
+ /* Close all the index relations */
+ toast_close_indexes(toastidxs, num_indexes, lock);
+
+ /* Close toast relation */
+ heap_close(toastrel, lock);
+ return validIndexOid;
+}
+
+
+/* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
@@ -1303,7 +1345,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1322,17 +1364,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int num_indexes;
+ int validIndex;
Assert(!VARATT_IS_EXTERNAL(value));
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid index used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1397,7 +1447,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
}
else
@@ -1434,7 +1484,8 @@ toast_save_datum(Relation rel, Datum value,
* be reclaimed by VACUUM.
*/
if (toastrel_valueid_exists(toastrel,
- toast_pointer.va_valueid))
+ toast_pointer.va_valueid,
+ RowExclusiveLock))
{
/* Match, so short-circuit the data storage loop below */
data_todo = 0;
@@ -1451,8 +1502,8 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
- (AttrNumber) 1);
+ RelationGetRelid(toastidxs[validIndex]),
+ (AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
}
@@ -1472,6 +1523,8 @@ toast_save_datum(Relation rel, Datum value,
*/
while (data_todo > 0)
{
+ int i;
+
/*
* Calculate the size of this chunk
*/
@@ -1490,16 +1543,22 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ {
+ /* Only index relations marked as ready can updated */
+ if (toastidxs[i]->rd_index->indisready)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ }
/*
* Free memory
@@ -1514,9 +1573,9 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relations
*/
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
@@ -1542,10 +1601,12 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ int num_indexes;
+ int validIndex;
if (!VARATT_IS_EXTERNAL_ONDISK(attr))
return;
@@ -1554,10 +1615,15 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1572,7 +1638,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1586,7 +1652,7 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
@@ -1598,11 +1664,20 @@ toast_delete_datum(Relation rel, Datum value)
* ----------
*/
static bool
-toastrel_valueid_exists(Relation toastrel, Oid valueid)
+toastrel_valueid_exists(Relation toastrel, Oid valueid, LOCKMODE lockmode)
{
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int num_indexes;
+ int validIndex;
+ Relation *toastidxs;
+
+ /* Fetch a valid index relation */
+ validIndex = toast_open_indexes(toastrel,
+ lockmode,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1615,14 +1690,18 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
- true, SnapshotToast, 1, &toastkey);
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(toastidxs[validIndex]),
+ true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
+ /* Clean up */
+ toast_close_indexes(toastidxs, num_indexes, lockmode);
+
return result;
}
@@ -1640,7 +1719,7 @@ toastid_valueid_exists(Oid toastrelid, Oid valueid)
toastrel = heap_open(toastrelid, AccessShareLock);
- result = toastrel_valueid_exists(toastrel, valueid);
+ result = toastrel_valueid_exists(toastrel, valueid, RowExclusiveLock);
heap_close(toastrel, AccessShareLock);
@@ -1659,7 +1738,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1674,6 +1753,8 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ int num_indexes;
+ int validIndex;
if (VARATT_IS_EXTERNAL_INDIRECT(attr))
elog(ERROR, "shouldn't be called for indirect tuples");
@@ -1692,11 +1773,16 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Fetch relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1715,7 +1801,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1804,7 +1890,7 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
@@ -1821,7 +1907,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1844,6 +1930,8 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int validIndex;
Assert(VARATT_IS_EXTERNAL_ONDISK(attr));
@@ -1886,11 +1974,16 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Look for the valid index of toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1931,7 +2024,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -2028,8 +2121,86 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
+
+/* ----------
+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsibility of the caller of this
+ * function to close the index relations as well as free them.
+ */
+static int
+toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);
+ Assert(indexlist != NIL);
+
+ *num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ *toastidxs = (Relation *) palloc(*num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ (*toastidxs)[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < *num_indexes; i++)
+ {
+ Relation toastidx = *toastidxs[i];
+ if (toastidx->rd_index->indisvalid)
+ {
+ res = i;
+ found = true;
+ break;
+ }
+ }
+
+ /*
+ * Free index list, not necessary as relations are opened and a valid index
+ * has been found.
+ */
+ list_free(indexlist);
+
+ /*
+ * The toast relation should have one valid index, so something is
+ * going wrong if there is nothing.
+ */
+ if (!found)
+ elog(ERROR, "no valid index found for toast relation with Oid %d",
+ RelationGetRelid(toastrel));
+
+ return res;
+}
+
+/* ----------
+ * toast_close_indexes
+ *
+ * Close an array of indexes for a toast relation and free it. This should
+ * be called for a set of index relations opened previously with
+ * toast_open_indexes.
+ */
+static void
+toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
+{
+ int i;
+
+ /* Close relations and clean up things */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4fd42ed..f1cdef9 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ca0c672..8525cb9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..3c2a474 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index f23730c..686770f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,6 +21,7 @@
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+#include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
@@ -1177,8 +1178,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1398,18 +1397,30 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * valid index. The swap can actually be safely done only if the relations
+ * have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
+ relform1->relkind == RELKIND_TOASTVALUE &&
+ relform2->relkind == RELKIND_TOASTVALUE)
+ {
+ Oid toastIndex1, toastIndex2;
+
+ /* Get valid index for each relation */
+ toastIndex1 = toast_get_valid_index(r1,
+ AccessExclusiveLock);
+ toastIndex2 = toast_get_valid_index(r2,
+ AccessExclusiveLock);
+
+ swap_relation_files(toastIndex1,
+ toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1533,14 +1544,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
- toastrel = relation_open(newrel->rd_rel->reltoastrelid,
- AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
- relation_close(toastrel, AccessShareLock);
+ /* Get the associated valid index to be renamed */
+ toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
+ AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1548,9 +1557,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
+ /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6a7aa44..6708725 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8878,7 +8878,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8886,6 +8885,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8932,7 +8933,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -9010,11 +9017,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
- /* Move associated toast relation and/or index, too */
+ /* Move associated toast relation and/or indexes, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 3157aba..92396b3 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -579,8 +579,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -592,7 +592,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 5ddeffe..34482ab 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,9 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +352,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ indexlist = RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
+ list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9ee9ea2..23e0373 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
/* only tables have toast tables, not indexes */
if (OidIsValid(pg_class_reltoastrelid))
{
+ PQExpBuffer index_query = createPQExpBuffer();
+ PGresult *index_res;
+ Oid indexrelid;
+
/*
* One complexity is that the table definition might not require
* the creation of a TOAST table, and the TOAST table might have
@@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* Every toast table has one valid index, so fetch it first... */
+ appendPQExpBuffer(index_query,
+ "SELECT c.indexrelid "
+ "FROM pg_catalog.pg_index c "
+ "WHERE c.indrelid = %u AND c.indisvalid;",
+ pg_class_reltoastrelid);
+ index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data);
+ indexrelid = atooid(PQgetvalue(index_res, 0,
+ PQfnumber(index_res, "indexrelid")));
+
+ /* Then set it */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
+ indexrelid);
+
+ PQclear(index_res);
+ destroyPQExpBuffer(index_query);
}
}
else
@@ -13126,7 +13141,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
* attislocal correctly, plus fix up any inherited CHECK constraints.
* Analogously, we set up typed tables using ALTER TABLE / OF here.
*/
- if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
+ if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
tbinfo->relkind == RELKIND_FOREIGN_TABLE) )
{
for (j = 0; j < tbinfo->numatts; j++)
@@ -13151,7 +13166,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
else
appendPQExpBuffer(q, "ALTER FOREIGN TABLE %s ",
fmtId(tbinfo->dobj.name));
-
+
appendPQExpBuffer(q, "DROP COLUMN %s;\n",
fmtId(tbinfo->attnames[j]));
}
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index d0c17fd..110b954 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -15,6 +15,7 @@
#include "access/htup_details.h"
#include "utils/relcache.h"
+#include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
@@ -193,4 +194,12 @@ extern Size toast_raw_datum_size(Datum value);
*/
extern Size toast_datum_size(Datum value);
+/* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index d46fe9e..9358e95 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201306121
+#define CATALOG_VERSION_NO 201307031
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..a6444a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
+ | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
On Wed, Jul 3, 2013 at 5:43 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Wed, Jul 3, 2013 at 5:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Why did you remove the check of indisvalid from the --binary-upgrade SQL?
Without this check, if there is the invalid toast index, more than one rows are
returned and ExecuteSqlQueryForSingleRow() would cause the error.+ foreach(lc, indexlist) + *toastidxs[i++] = index_open(lfirst_oid(lc), lock);*toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
happen.For now I've not found any other big problem except the above.
system_views.sql
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;
I found another problem. X.indexrelid should be X.indrelid. Otherwise, when
there is the invalid toast index, more than one rows are returned for the same
relation.
OK cool, updated version attached. If you guys think that the attached
version is fine (only the reltoasyidxid removal part), perhaps it
would be worth committing it as Robert also committed the MVCC catalog
patch today. So we would be able to focus on the core feature asap
with the 2nd patch, and the removal of AccessExclusiveLock at swap
step.
Yep, will do. Maybe today.
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Updated version of this patch attached. At the same time I changed
toastrel_valueid_exists back to its former shape by removing the extra
LOCKMODE argument I added to pass argument a lock to
toast_open_indexes and toast_close_indexes as at all the places only
RowExclusiveLock is used.
On Wed, Jul 3, 2013 at 5:51 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Wed, Jul 3, 2013 at 5:43 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Wed, Jul 3, 2013 at 5:22 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
Why did you remove the check of indisvalid from the --binary-upgrade SQL?
Without this check, if there is the invalid toast index, more than one rows are
returned and ExecuteSqlQueryForSingleRow() would cause the error.+ foreach(lc, indexlist) + *toastidxs[i++] = index_open(lfirst_oid(lc), lock);*toastidxs[i++] should be (*toastidxs)[i++]. Otherwise, segmentation fault can
happen.For now I've not found any other big problem except the above.
system_views.sql - GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid; + GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indexrelid;I found another problem. X.indexrelid should be X.indrelid. Otherwise, when
there is the invalid toast index, more than one rows are returned for the same
relation.
Indeed, fixed
OK cool, updated version attached. If you guys think that the attached
version is fine (only the reltoasyidxid removal part), perhaps it
would be worth committing it as Robert also committed the MVCC catalog
patch today. So we would be able to focus on the core feature asap
with the 2nd patch, and the removal of AccessExclusiveLock at swap
step.Yep, will do. Maybe today.
I also double-checked with gdb and the REINDEX CONCURRENTLY patch
applied on top of the attached patch that the new code paths
introduced in tuptoaster.c are fine.
Regards,
--
Michael
Attachments:
20130701_1_remove_reltoastidxid_v17.patchapplication/octet-stream; name=20130701_1_remove_reltoastidxid_v17.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..18daf1c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,19 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indisvalid "
+ " AND indrelid IN (SELECT reltoastrelid "
+ " FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u)",
+ InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 09f7e40..6715782 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 445a7ed..a985fdf 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -78,6 +78,12 @@ static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static int toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes);
+static void toast_close_indexes(Relation *toastidxs, int num_indexes,
+ LOCKMODE lock);
/* ----------
@@ -1287,6 +1293,41 @@ toast_compress_datum(Datum value)
/* ----------
+ * toast_get_valid_index
+ *
+ * Get the valid index of given toast relation. A toast relation can only
+ * have one valid index at the same time. The lock taken on the index
+ * relations is released at the end of this function call.
+ */
+Oid
+toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+{
+ int num_indexes;
+ int validIndex;
+ Oid validIndexOid;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Get the index list of relation */
+ toastrel = heap_open(toastoid, lock);
+
+ /* Look for the valid index of relation */
+ validIndex = toast_open_indexes(toastrel,
+ lock,
+ &toastidxs,
+ &num_indexes);
+ validIndexOid = RelationGetRelid(toastidxs[validIndex]);
+
+ /* Close all the index relations */
+ toast_close_indexes(toastidxs, num_indexes, lock);
+
+ /* Close toast relation */
+ heap_close(toastrel, lock);
+ return validIndexOid;
+}
+
+
+/* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
@@ -1303,7 +1344,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1322,17 +1363,25 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int num_indexes;
+ int validIndex;
Assert(!VARATT_IS_EXTERNAL(value));
/*
* Open the toast relation and its index. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
- * additional columns besides OID.
+ * additional columns besides OID. A toast table can have multiple identical
+ * indexes associated to it.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid index used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1397,7 +1446,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
}
else
@@ -1451,8 +1500,8 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
- (AttrNumber) 1);
+ RelationGetRelid(toastidxs[validIndex]),
+ (AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
}
@@ -1472,6 +1521,8 @@ toast_save_datum(Relation rel, Datum value,
*/
while (data_todo > 0)
{
+ int i;
+
/*
* Calculate the size of this chunk
*/
@@ -1490,16 +1541,22 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ {
+ /* Only index relations marked as ready can updated */
+ if (toastidxs[i]->rd_index->indisready)
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ }
/*
* Free memory
@@ -1514,9 +1571,9 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relations
*/
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
@@ -1542,10 +1599,12 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ int num_indexes;
+ int validIndex;
if (!VARATT_IS_EXTERNAL_ONDISK(attr))
return;
@@ -1554,10 +1613,15 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1572,7 +1636,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1586,7 +1650,7 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
@@ -1603,6 +1667,15 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int num_indexes;
+ int validIndex;
+ Relation *toastidxs;
+
+ /* Fetch a valid index relation */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1615,14 +1688,18 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
- true, SnapshotToast, 1, &toastkey);
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(toastidxs[validIndex]),
+ true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
+ /* Clean up */
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
+
return result;
}
@@ -1659,7 +1736,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1674,6 +1751,8 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ int num_indexes;
+ int validIndex;
if (VARATT_IS_EXTERNAL_INDIRECT(attr))
elog(ERROR, "shouldn't be called for indirect tuples");
@@ -1692,11 +1771,16 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Fetch relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1715,7 +1799,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1804,7 +1888,7 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
@@ -1821,7 +1905,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1844,6 +1928,8 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int validIndex;
Assert(VARATT_IS_EXTERNAL_ONDISK(attr));
@@ -1886,11 +1972,16 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Look for the valid index of toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1931,7 +2022,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -2028,8 +2119,86 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
+
+/* ----------
+ * toast_open_indexes
+ *
+ * Get an array of index relations associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsibility of the caller of this
+ * function to close the index relations as well as free them.
+ */
+static int
+toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of relation */
+ indexlist = RelationGetIndexList(toastrel);
+ Assert(indexlist != NIL);
+
+ *num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ *toastidxs = (Relation *) palloc(*num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ (*toastidxs)[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < *num_indexes; i++)
+ {
+ Relation toastidx = *toastidxs[i];
+ if (toastidx->rd_index->indisvalid)
+ {
+ res = i;
+ found = true;
+ break;
+ }
+ }
+
+ /*
+ * Free index list, not necessary as relations are opened and a valid index
+ * has been found.
+ */
+ list_free(indexlist);
+
+ /*
+ * The toast relation should have one valid index, so something is
+ * going wrong if there is nothing.
+ */
+ if (!found)
+ elog(ERROR, "no valid index found for toast relation with Oid %d",
+ RelationGetRelid(toastrel));
+
+ return res;
+}
+
+/* ----------
+ * toast_close_indexes
+ *
+ * Close an array of indexes for a toast relation and free it. This should
+ * be called for a set of index relations opened previously with
+ * toast_open_indexes.
+ */
+static void
+toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
+{
+ int i;
+
+ /* Close relations and clean up things */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4fd42ed..f1cdef9 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ca0c672..8525cb9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..d3086f4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index f23730c..686770f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,6 +21,7 @@
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+#include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
@@ -1177,8 +1178,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1398,18 +1397,30 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * valid index. The swap can actually be safely done only if the relations
+ * have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
+ relform1->relkind == RELKIND_TOASTVALUE &&
+ relform2->relkind == RELKIND_TOASTVALUE)
+ {
+ Oid toastIndex1, toastIndex2;
+
+ /* Get valid index for each relation */
+ toastIndex1 = toast_get_valid_index(r1,
+ AccessExclusiveLock);
+ toastIndex2 = toast_get_valid_index(r2,
+ AccessExclusiveLock);
+
+ swap_relation_files(toastIndex1,
+ toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1533,14 +1544,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
- toastrel = relation_open(newrel->rd_rel->reltoastrelid,
- AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
- relation_close(toastrel, AccessShareLock);
+ /* Get the associated valid index to be renamed */
+ toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
+ AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1548,9 +1557,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
+ /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6a7aa44..6708725 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8878,7 +8878,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8886,6 +8885,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8932,7 +8933,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -9010,11 +9017,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
- /* Move associated toast relation and/or index, too */
+ /* Move associated toast relation and/or indexes, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 3157aba..92396b3 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -579,8 +579,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -592,7 +592,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 5ddeffe..34482ab 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,9 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +352,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ indexlist = RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
+ list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9ee9ea2..23e0373 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
"pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
@@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
/* only tables have toast tables, not indexes */
if (OidIsValid(pg_class_reltoastrelid))
{
+ PQExpBuffer index_query = createPQExpBuffer();
+ PGresult *index_res;
+ Oid indexrelid;
+
/*
* One complexity is that the table definition might not require
* the creation of a TOAST table, and the TOAST table might have
@@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
"SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n",
pg_class_reltoastrelid);
- /* every toast table has an index */
+ /* Every toast table has one valid index, so fetch it first... */
+ appendPQExpBuffer(index_query,
+ "SELECT c.indexrelid "
+ "FROM pg_catalog.pg_index c "
+ "WHERE c.indrelid = %u AND c.indisvalid;",
+ pg_class_reltoastrelid);
+ index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data);
+ indexrelid = atooid(PQgetvalue(index_res, 0,
+ PQfnumber(index_res, "indexrelid")));
+
+ /* Then set it */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
+ indexrelid);
+
+ PQclear(index_res);
+ destroyPQExpBuffer(index_query);
}
}
else
@@ -13126,7 +13141,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
* attislocal correctly, plus fix up any inherited CHECK constraints.
* Analogously, we set up typed tables using ALTER TABLE / OF here.
*/
- if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
+ if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
tbinfo->relkind == RELKIND_FOREIGN_TABLE) )
{
for (j = 0; j < tbinfo->numatts; j++)
@@ -13151,7 +13166,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
else
appendPQExpBuffer(q, "ALTER FOREIGN TABLE %s ",
fmtId(tbinfo->dobj.name));
-
+
appendPQExpBuffer(q, "DROP COLUMN %s;\n",
fmtId(tbinfo->attnames[j]));
}
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index d0c17fd..110b954 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -15,6 +15,7 @@
#include "access/htup_details.h"
#include "utils/relcache.h"
+#include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
@@ -193,4 +194,12 @@ extern Size toast_raw_datum_size(Datum value);
*/
extern Size toast_datum_size(Datum value);
+/* ----------
+ * toast_get_valid_index -
+ *
+ * Return valid index associated to a toast relation
+ * ----------
+ */
+extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index d46fe9e..9358e95 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201306121
+#define CATALOG_VERSION_NO 201307031
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..a6444a0 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
+ | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indexrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
On 2013-07-03 10:03:26 +0900, Michael Paquier wrote:
+static int +toast_open_indexes(Relation toastrel, + LOCKMODE lock, + Relation **toastidxs, + int *num_indexes) + /* + * Free index list, not necessary as relations are opened and a valid index + * has been found. + */ + list_free(indexlist);
Missing "anymore" or such.
index 9ee9ea2..23e0373 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout, PQExpBuffer upgrade_query = createPQExpBuffer(); PGresult *upgrade_res; Oid pg_class_reltoastrelid; - Oid pg_class_reltoastidxid;appendPQExpBuffer(upgrade_query, - "SELECT c.reltoastrelid, t.reltoastidxid " + "SELECT c.reltoastrelid " "FROM pg_catalog.pg_class c LEFT JOIN " "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " "WHERE c.oid = '%u'::pg_catalog.oid;", @@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout, upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));appendPQExpBuffer(upgrade_buffer, "\n-- For binary upgrade, must preserve pg_class oids\n"); @@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout, /* only tables have toast tables, not indexes */ if (OidIsValid(pg_class_reltoastrelid)) { + PQExpBuffer index_query = createPQExpBuffer(); + PGresult *index_res; + Oid indexrelid; + /* * One complexity is that the table definition might not require * the creation of a TOAST table, and the TOAST table might have @@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout, "SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n", pg_class_reltoastrelid);- /* every toast table has an index */ + /* Every toast table has one valid index, so fetch it first... */ + appendPQExpBuffer(index_query, + "SELECT c.indexrelid " + "FROM pg_catalog.pg_index c " + "WHERE c.indrelid = %u AND c.indisvalid;", + pg_class_reltoastrelid); + index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data); + indexrelid = atooid(PQgetvalue(index_res, 0, + PQfnumber(index_res, "indexrelid"))); + + /* Then set it */ appendPQExpBuffer(upgrade_buffer, "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n", - pg_class_reltoastidxid); + indexrelid); + + PQclear(index_res); + destroyPQExpBuffer(index_query);
Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?
Looking good!
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Jul 3, 2013 at 11:16 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-07-03 10:03:26 +0900, Michael Paquier wrote:
index 9ee9ea2..23e0373 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -2778,10 +2778,9 @@ binary_upgrade_set_pg_class_oids(Archive *fout, PQExpBuffer upgrade_query = createPQExpBuffer(); PGresult *upgrade_res; Oid pg_class_reltoastrelid; - Oid pg_class_reltoastidxid;appendPQExpBuffer(upgrade_query, - "SELECT c.reltoastrelid, t.reltoastidxid " + "SELECT c.reltoastrelid " "FROM pg_catalog.pg_class c LEFT JOIN " "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) " "WHERE c.oid = '%u'::pg_catalog.oid;", @@ -2790,7 +2789,6 @@ binary_upgrade_set_pg_class_oids(Archive *fout, upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));appendPQExpBuffer(upgrade_buffer, "\n-- For binary upgrade, must preserve pg_class oids\n"); @@ -2803,6 +2801,10 @@ binary_upgrade_set_pg_class_oids(Archive *fout, /* only tables have toast tables, not indexes */ if (OidIsValid(pg_class_reltoastrelid)) { + PQExpBuffer index_query = createPQExpBuffer(); + PGresult *index_res; + Oid indexrelid; + /* * One complexity is that the table definition might not require * the creation of a TOAST table, and the TOAST table might have @@ -2816,10 +2818,23 @@ binary_upgrade_set_pg_class_oids(Archive *fout, "SELECT binary_upgrade.set_next_toast_pg_class_oid('%u'::pg_catalog.oid);\n", pg_class_reltoastrelid);- /* every toast table has an index */ + /* Every toast table has one valid index, so fetch it first... */ + appendPQExpBuffer(index_query, + "SELECT c.indexrelid " + "FROM pg_catalog.pg_index c " + "WHERE c.indrelid = %u AND c.indisvalid;", + pg_class_reltoastrelid); + index_res = ExecuteSqlQueryForSingleRow(fout, index_query->data); + indexrelid = atooid(PQgetvalue(index_res, 0, + PQfnumber(index_res, "indexrelid"))); + + /* Then set it */ appendPQExpBuffer(upgrade_buffer, "SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n", - pg_class_reltoastidxid); + indexrelid); + + PQclear(index_res); + destroyPQExpBuffer(index_query);Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?
With something like a CASE condition in the upper query for
reltoastrelid? This code path is not only taken by indexes but also by
tables. So I thought that it was cleaner and more readable to fetch
the index OID only if necessary as a separate query.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?With something like a CASE condition in the upper query for
reltoastrelid? This code path is not only taken by indexes but also by
tables. So I thought that it was cleaner and more readable to fetch
the index OID only if necessary as a separate query.
A left join should do the trick?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?
+1
I changed the query that way. Updated version of the patch attached.
Also I updated the rules.out because Michael changed the system_views.sql.
Otherwise, the regression test would fail.
Will commit this patch.
Regards,
--
Fujii Masao
Attachments:
20130704_1_remove_reltoastidxid_v18.patchapplication/octet-stream; name=20130704_1_remove_reltoastidxid_v18.patchDownload
diff --git a/contrib/pg_upgrade/info.c b/contrib/pg_upgrade/info.c
index c381f11..18daf1c 100644
--- a/contrib/pg_upgrade/info.c
+++ b/contrib/pg_upgrade/info.c
@@ -321,12 +321,19 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
"INSERT INTO info_rels "
"SELECT reltoastrelid "
"FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u", InvalidOid));
PQclear(executeQueryOrDie(conn,
"INSERT INTO info_rels "
- "SELECT reltoastidxid "
- "FROM info_rels i JOIN pg_catalog.pg_class c "
- " ON i.reloid = c.oid"));
+ "SELECT indexrelid "
+ "FROM pg_index "
+ "WHERE indisvalid "
+ " AND indrelid IN (SELECT reltoastrelid "
+ " FROM info_rels i "
+ " JOIN pg_catalog.pg_class c "
+ " ON i.reloid = c.oid "
+ " AND c.reltoastrelid != %u)",
+ InvalidOid));
snprintf(query, sizeof(query),
"SELECT c.oid, n.nspname, c.relname, "
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 09f7e40..6715782 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1745,15 +1745,6 @@
</row>
<row>
- <entry><structfield>reltoastidxid</structfield></entry>
- <entry><type>oid</type></entry>
- <entry><literal><link linkend="catalog-pg-class"><structname>pg_class</structname></link>.oid</literal></entry>
- <entry>
- For a TOAST table, the OID of its index. 0 if not a TOAST table.
- </entry>
- </row>
-
- <row>
<entry><structfield>relhasindex</structfield></entry>
<entry><type>bool</type></entry>
<entry></entry>
diff --git a/doc/src/sgml/diskusage.sgml b/doc/src/sgml/diskusage.sgml
index de1d0b4..461deb9 100644
--- a/doc/src/sgml/diskusage.sgml
+++ b/doc/src/sgml/diskusage.sgml
@@ -20,12 +20,12 @@
stored. If the table has any columns with potentially-wide values,
there also might be a <acronym>TOAST</> file associated with the table,
which is used to store values too wide to fit comfortably in the main
- table (see <xref linkend="storage-toast">). There will be one index on the
- <acronym>TOAST</> table, if present. There also might be indexes associated
- with the base table. Each table and index is stored in a separate disk
- file — possibly more than one file, if the file would exceed one
- gigabyte. Naming conventions for these files are described in <xref
- linkend="storage-file-layout">.
+ table (see <xref linkend="storage-toast">). There will be one valid index
+ on the <acronym>TOAST</> table, if present. There also might be indexes
+ associated with the base table. Each table and index is stored in a
+ separate disk file — possibly more than one file, if the file would
+ exceed one gigabyte. Naming conventions for these files are described
+ in <xref linkend="storage-file-layout">.
</para>
<para>
@@ -44,7 +44,7 @@
<programlisting>
SELECT pg_relation_filepath(oid), relpages FROM pg_class WHERE relname = 'customer';
- pg_relation_filepath | relpages
+ pg_relation_filepath | relpages
----------------------+----------
base/16384/16806 | 60
(1 row)
@@ -65,12 +65,12 @@ FROM pg_class,
FROM pg_class
WHERE relname = 'customer') AS ss
WHERE oid = ss.reltoastrelid OR
- oid = (SELECT reltoastidxid
- FROM pg_class
- WHERE oid = ss.reltoastrelid)
+ oid = (SELECT indexrelid
+ FROM pg_index
+ WHERE indrelid = ss.reltoastrelid)
ORDER BY relname;
- relname | relpages
+ relname | relpages
----------------------+----------
pg_toast_16806 | 0
pg_toast_16806_index | 1
@@ -87,7 +87,7 @@ WHERE c.relname = 'customer' AND
c2.oid = i.indexrelid
ORDER BY c2.relname;
- relname | relpages
+ relname | relpages
----------------------+----------
customer_id_indexdex | 26
</programlisting>
@@ -101,7 +101,7 @@ SELECT relname, relpages
FROM pg_class
ORDER BY relpages DESC;
- relname | relpages
+ relname | relpages
----------------------+----------
bigtable | 3290
customer | 3144
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index b37b6c3..d38c009 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1163,12 +1163,12 @@ postgres: <replaceable>user</> <replaceable>database</> <replaceable>host</> <re
<row>
<entry><structfield>tidx_blks_read</></entry>
<entry><type>bigint</></entry>
- <entry>Number of disk blocks read from this table's TOAST table index (if any)</entry>
+ <entry>Number of disk blocks read from this table's TOAST table indexes (if any)</entry>
</row>
<row>
<entry><structfield>tidx_blks_hit</></entry>
<entry><type>bigint</></entry>
- <entry>Number of buffer hits in this table's TOAST table index (if any)</entry>
+ <entry>Number of buffer hits in this table's TOAST table indexes (if any)</entry>
</row>
</tbody>
</tgroup>
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 445a7ed..675bfcc 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -78,6 +78,12 @@ static bool toastid_valueid_exists(Oid toastrelid, Oid valueid);
static struct varlena *toast_fetch_datum(struct varlena * attr);
static struct varlena *toast_fetch_datum_slice(struct varlena * attr,
int32 sliceoffset, int32 length);
+static int toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes);
+static void toast_close_indexes(Relation *toastidxs, int num_indexes,
+ LOCKMODE lock);
/* ----------
@@ -1287,6 +1293,39 @@ toast_compress_datum(Datum value)
/* ----------
+ * toast_get_valid_index
+ *
+ * Get OID of valid index associated to given toast relation. A toast
+ * relation can have only one valid index at the same time.
+ */
+Oid
+toast_get_valid_index(Oid toastoid, LOCKMODE lock)
+{
+ int num_indexes;
+ int validIndex;
+ Oid validIndexOid;
+ Relation *toastidxs;
+ Relation toastrel;
+
+ /* Open the toast relation */
+ toastrel = heap_open(toastoid, lock);
+
+ /* Look for the valid index of the toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ lock,
+ &toastidxs,
+ &num_indexes);
+ validIndexOid = RelationGetRelid(toastidxs[validIndex]);
+
+ /* Close the toast relation and all its indexes */
+ toast_close_indexes(toastidxs, num_indexes, lock);
+ heap_close(toastrel, lock);
+
+ return validIndexOid;
+}
+
+
+/* ----------
* toast_save_datum -
*
* Save one single datum into the secondary relation and return
@@ -1303,7 +1342,7 @@ toast_save_datum(Relation rel, Datum value,
struct varlena * oldexternal, int options)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
HeapTuple toasttup;
TupleDesc toasttupDesc;
Datum t_values[3];
@@ -1322,17 +1361,24 @@ toast_save_datum(Relation rel, Datum value,
char *data_p;
int32 data_todo;
Pointer dval = DatumGetPointer(value);
+ int num_indexes;
+ int validIndex;
Assert(!VARATT_IS_EXTERNAL(value));
/*
- * Open the toast relation and its index. We can use the index to check
+ * Open the toast relation and its indexes. We can use the index to check
* uniqueness of the OID we assign to the toasted item, even though it has
* additional columns besides OID.
*/
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Open all the toast indexes and look for the valid */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Get the data pointer and length, and compute va_rawsize and va_extsize.
@@ -1397,7 +1443,7 @@ toast_save_datum(Relation rel, Datum value,
/* normal case: just choose an unused OID */
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
}
else
@@ -1451,7 +1497,7 @@ toast_save_datum(Relation rel, Datum value,
{
toast_pointer.va_valueid =
GetNewOidWithIndex(toastrel,
- RelationGetRelid(toastidx),
+ RelationGetRelid(toastidxs[validIndex]),
(AttrNumber) 1);
} while (toastid_valueid_exists(rel->rd_toastoid,
toast_pointer.va_valueid));
@@ -1472,6 +1518,8 @@ toast_save_datum(Relation rel, Datum value,
*/
while (data_todo > 0)
{
+ int i;
+
/*
* Calculate the size of this chunk
*/
@@ -1490,16 +1538,22 @@ toast_save_datum(Relation rel, Datum value,
/*
* Create the index entry. We cheat a little here by not using
* FormIndexDatum: this relies on the knowledge that the index columns
- * are the same as the initial columns of the table.
+ * are the same as the initial columns of the table for all the
+ * indexes.
*
* Note also that there had better not be any user-created index on
* the TOAST table, since we don't bother to update anything else.
*/
- index_insert(toastidx, t_values, t_isnull,
- &(toasttup->t_self),
- toastrel,
- toastidx->rd_index->indisunique ?
- UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ for (i = 0; i < num_indexes; i++)
+ {
+ /* Only index relations marked as ready can updated */
+ if (IndexIsReady(toastidxs[i]->rd_index))
+ index_insert(toastidxs[i], t_values, t_isnull,
+ &(toasttup->t_self),
+ toastrel,
+ toastidxs[i]->rd_index->indisunique ?
+ UNIQUE_CHECK_YES : UNIQUE_CHECK_NO);
+ }
/*
* Free memory
@@ -1514,9 +1568,9 @@ toast_save_datum(Relation rel, Datum value,
}
/*
- * Done - close toast relation
+ * Done - close toast relation and its indexes
*/
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
/*
@@ -1542,10 +1596,12 @@ toast_delete_datum(Relation rel, Datum value)
struct varlena *attr = (struct varlena *) DatumGetPointer(value);
struct varatt_external toast_pointer;
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple toasttup;
+ int num_indexes;
+ int validIndex;
if (!VARATT_IS_EXTERNAL_ONDISK(attr))
return;
@@ -1554,10 +1610,15 @@ toast_delete_datum(Relation rel, Datum value)
VARATT_EXTERNAL_GET_POINTER(toast_pointer, attr);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, RowExclusiveLock);
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, RowExclusiveLock);
+
+ /* Fetch valid relation used for process */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1572,7 +1633,7 @@ toast_delete_datum(Relation rel, Datum value)
* sequence or not, but since we've already locked the index we might as
* well use systable_beginscan_ordered.)
*/
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((toasttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1586,7 +1647,7 @@ toast_delete_datum(Relation rel, Datum value)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, RowExclusiveLock);
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
heap_close(toastrel, RowExclusiveLock);
}
@@ -1603,6 +1664,15 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
bool result = false;
ScanKeyData toastkey;
SysScanDesc toastscan;
+ int num_indexes;
+ int validIndex;
+ Relation *toastidxs;
+
+ /* Fetch a valid index relation */
+ validIndex = toast_open_indexes(toastrel,
+ RowExclusiveLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to find chunks with matching va_valueid
@@ -1615,14 +1685,18 @@ toastrel_valueid_exists(Relation toastrel, Oid valueid)
/*
* Is there any such chunk?
*/
- toastscan = systable_beginscan(toastrel, toastrel->rd_rel->reltoastidxid,
- true, SnapshotToast, 1, &toastkey);
+ toastscan = systable_beginscan(toastrel,
+ RelationGetRelid(toastidxs[validIndex]),
+ true, SnapshotToast, 1, &toastkey);
if (systable_getnext(toastscan) != NULL)
result = true;
systable_endscan(toastscan);
+ /* Clean up */
+ toast_close_indexes(toastidxs, num_indexes, RowExclusiveLock);
+
return result;
}
@@ -1659,7 +1733,7 @@ static struct varlena *
toast_fetch_datum(struct varlena * attr)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey;
SysScanDesc toastscan;
HeapTuple ttup;
@@ -1674,6 +1748,8 @@ toast_fetch_datum(struct varlena * attr)
bool isnull;
char *chunkdata;
int32 chunksize;
+ int num_indexes;
+ int validIndex;
if (VARATT_IS_EXTERNAL_INDIRECT(attr))
elog(ERROR, "shouldn't be called for indirect tuples");
@@ -1692,11 +1768,16 @@ toast_fetch_datum(struct varlena * attr)
SET_VARSIZE(result, ressize + VARHDRSZ);
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Look for the valid index of the toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index by va_valueid
@@ -1715,7 +1796,7 @@ toast_fetch_datum(struct varlena * attr)
*/
nextidx = 0;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, 1, &toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -1804,7 +1885,7 @@ toast_fetch_datum(struct varlena * attr)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
@@ -1821,7 +1902,7 @@ static struct varlena *
toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
{
Relation toastrel;
- Relation toastidx;
+ Relation *toastidxs;
ScanKeyData toastkey[3];
int nscankeys;
SysScanDesc toastscan;
@@ -1844,6 +1925,8 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
int32 chunksize;
int32 chcpystrt;
int32 chcpyend;
+ int num_indexes;
+ int validIndex;
Assert(VARATT_IS_EXTERNAL_ONDISK(attr));
@@ -1886,11 +1969,16 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
endoffset = (sliceoffset + length - 1) % TOAST_MAX_CHUNK_SIZE;
/*
- * Open the toast relation and its index
+ * Open the toast relation and its indexes
*/
toastrel = heap_open(toast_pointer.va_toastrelid, AccessShareLock);
toasttupDesc = toastrel->rd_att;
- toastidx = index_open(toastrel->rd_rel->reltoastidxid, AccessShareLock);
+
+ /* Look for the valid index of toast relation */
+ validIndex = toast_open_indexes(toastrel,
+ AccessShareLock,
+ &toastidxs,
+ &num_indexes);
/*
* Setup a scan key to fetch from the index. This is either two keys or
@@ -1931,7 +2019,7 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* The index is on (valueid, chunkidx) so they will come in order
*/
nextidx = startchunk;
- toastscan = systable_beginscan_ordered(toastrel, toastidx,
+ toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
SnapshotToast, nscankeys, toastkey);
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
@@ -2028,8 +2116,85 @@ toast_fetch_datum_slice(struct varlena * attr, int32 sliceoffset, int32 length)
* End scan and close relations
*/
systable_endscan_ordered(toastscan);
- index_close(toastidx, AccessShareLock);
+ toast_close_indexes(toastidxs, num_indexes, AccessShareLock);
heap_close(toastrel, AccessShareLock);
return result;
}
+
+/* ----------
+ * toast_open_indexes
+ *
+ * Get an array of the indexes associated to the given toast relation
+ * and return as well the position of the valid index used by the toast
+ * relation in this array. It is the responsibility of the caller of this
+ * function to close the indexes as well as free them.
+ */
+static int
+toast_open_indexes(Relation toastrel,
+ LOCKMODE lock,
+ Relation **toastidxs,
+ int *num_indexes)
+{
+ int i = 0;
+ int res = 0;
+ bool found = false;
+ List *indexlist;
+ ListCell *lc;
+
+ /* Get index list of the toast relation */
+ indexlist = RelationGetIndexList(toastrel);
+ Assert(indexlist != NIL);
+
+ *num_indexes = list_length(indexlist);
+
+ /* Open all the index relations */
+ *toastidxs = (Relation *) palloc(*num_indexes * sizeof(Relation));
+ foreach(lc, indexlist)
+ (*toastidxs)[i++] = index_open(lfirst_oid(lc), lock);
+
+ /* Fetch the first valid index in list */
+ for (i = 0; i < *num_indexes; i++)
+ {
+ Relation toastidx = *toastidxs[i];
+ if (toastidx->rd_index->indisvalid)
+ {
+ res = i;
+ found = true;
+ break;
+ }
+ }
+
+ /*
+ * Free index list, not necessary anymore as relations are opened
+ * and a valid index has been found.
+ */
+ list_free(indexlist);
+
+ /*
+ * The toast relation should have one valid index, so something is
+ * going wrong if there is nothing.
+ */
+ if (!found)
+ elog(ERROR, "no valid index found for toast relation with Oid %d",
+ RelationGetRelid(toastrel));
+
+ return res;
+}
+
+/* ----------
+ * toast_close_indexes
+ *
+ * Close an array of indexes for a toast relation and free it. This should
+ * be called for a set of indexes opened previously with toast_open_indexes.
+ */
+static void
+toast_close_indexes(Relation *toastidxs, int num_indexes, LOCKMODE lock)
+{
+ int i;
+
+ /* Close relations and clean up things */
+ for (i = 0; i < num_indexes; i++)
+ index_close(toastidxs[i], lock);
+ pfree(toastidxs);
+}
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 4fd42ed..f1cdef9 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -781,7 +781,6 @@ InsertPgClassTuple(Relation pg_class_desc,
values[Anum_pg_class_reltuples - 1] = Float4GetDatum(rd_rel->reltuples);
values[Anum_pg_class_relallvisible - 1] = Int32GetDatum(rd_rel->relallvisible);
values[Anum_pg_class_reltoastrelid - 1] = ObjectIdGetDatum(rd_rel->reltoastrelid);
- values[Anum_pg_class_reltoastidxid - 1] = ObjectIdGetDatum(rd_rel->reltoastidxid);
values[Anum_pg_class_relhasindex - 1] = BoolGetDatum(rd_rel->relhasindex);
values[Anum_pg_class_relisshared - 1] = BoolGetDatum(rd_rel->relisshared);
values[Anum_pg_class_relpersistence - 1] = CharGetDatum(rd_rel->relpersistence);
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ca0c672..8525cb9 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -103,7 +103,7 @@ static void UpdateIndexRelation(Oid indexoid, Oid heapoid,
bool isvalid);
static void index_update_stats(Relation rel,
bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples);
+ double reltuples);
static void IndexCheckExclusion(Relation heapRelation,
Relation indexRelation,
IndexInfo *indexInfo);
@@ -1072,7 +1072,6 @@ index_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- InvalidOid,
-1.0);
/* Make the above update visible */
CommandCounterIncrement();
@@ -1254,7 +1253,6 @@ index_constraint_create(Relation heapRelation,
index_update_stats(heapRelation,
true,
true,
- InvalidOid,
-1.0);
/*
@@ -1764,8 +1762,6 @@ FormIndexDatum(IndexInfo *indexInfo,
*
* hasindex: set relhasindex to this value
* isprimary: if true, set relhaspkey true; else no change
- * reltoastidxid: if not InvalidOid, set reltoastidxid to this value;
- * else no change
* reltuples: if >= 0, set reltuples to this value; else no change
*
* If reltuples >= 0, relpages and relallvisible are also updated (using
@@ -1781,8 +1777,9 @@ FormIndexDatum(IndexInfo *indexInfo,
*/
static void
index_update_stats(Relation rel,
- bool hasindex, bool isprimary,
- Oid reltoastidxid, double reltuples)
+ bool hasindex,
+ bool isprimary,
+ double reltuples)
{
Oid relid = RelationGetRelid(rel);
Relation pg_class;
@@ -1876,15 +1873,6 @@ index_update_stats(Relation rel,
dirty = true;
}
}
- if (OidIsValid(reltoastidxid))
- {
- Assert(rd_rel->relkind == RELKIND_TOASTVALUE);
- if (rd_rel->reltoastidxid != reltoastidxid)
- {
- rd_rel->reltoastidxid = reltoastidxid;
- dirty = true;
- }
- }
if (reltuples >= 0)
{
@@ -2072,14 +2060,11 @@ index_build(Relation heapRelation,
index_update_stats(heapRelation,
true,
isprimary,
- (heapRelation->rd_rel->relkind == RELKIND_TOASTVALUE) ?
- RelationGetRelid(indexRelation) : InvalidOid,
stats->heap_tuples);
index_update_stats(indexRelation,
false,
false,
- InvalidOid,
stats->index_tuples);
/* Make the updated catalog row versions visible */
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 81d7c4f..d3086f4 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -473,16 +473,16 @@ CREATE VIEW pg_statio_all_tables AS
pg_stat_get_blocks_fetched(T.oid) -
pg_stat_get_blocks_hit(T.oid) AS toast_blks_read,
pg_stat_get_blocks_hit(T.oid) AS toast_blks_hit,
- pg_stat_get_blocks_fetched(X.oid) -
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_read,
- pg_stat_get_blocks_hit(X.oid) AS tidx_blks_hit
+ sum(pg_stat_get_blocks_fetched(X.indexrelid) -
+ pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_read,
+ sum(pg_stat_get_blocks_hit(X.indexrelid))::bigint AS tidx_blks_hit
FROM pg_class C LEFT JOIN
pg_index I ON C.oid = I.indrelid LEFT JOIN
pg_class T ON C.reltoastrelid = T.oid LEFT JOIN
- pg_class X ON T.reltoastidxid = X.oid
+ pg_index X ON T.oid = X.indrelid
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE C.relkind IN ('r', 't', 'm')
- GROUP BY C.oid, N.nspname, C.relname, T.oid, X.oid;
+ GROUP BY C.oid, N.nspname, C.relname, T.oid, X.indrelid;
CREATE VIEW pg_statio_sys_tables AS
SELECT * FROM pg_statio_all_tables
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index f23730c..686770f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -21,6 +21,7 @@
#include "access/relscan.h"
#include "access/rewriteheap.h"
#include "access/transam.h"
+#include "access/tuptoaster.h"
#include "access/xact.h"
#include "catalog/catalog.h"
#include "catalog/dependency.h"
@@ -1177,8 +1178,6 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
swaptemp = relform1->reltoastrelid;
relform1->reltoastrelid = relform2->reltoastrelid;
relform2->reltoastrelid = swaptemp;
-
- /* we should NOT swap reltoastidxid */
}
}
else
@@ -1398,18 +1397,30 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
/*
* If we're swapping two toast tables by content, do the same for their
- * indexes.
+ * valid index. The swap can actually be safely done only if the relations
+ * have indexes.
*/
if (swap_toast_by_content &&
- relform1->reltoastidxid && relform2->reltoastidxid)
- swap_relation_files(relform1->reltoastidxid,
- relform2->reltoastidxid,
+ relform1->relkind == RELKIND_TOASTVALUE &&
+ relform2->relkind == RELKIND_TOASTVALUE)
+ {
+ Oid toastIndex1, toastIndex2;
+
+ /* Get valid index for each relation */
+ toastIndex1 = toast_get_valid_index(r1,
+ AccessExclusiveLock);
+ toastIndex2 = toast_get_valid_index(r2,
+ AccessExclusiveLock);
+
+ swap_relation_files(toastIndex1,
+ toastIndex2,
target_is_pg_class,
swap_toast_by_content,
is_internal,
InvalidTransactionId,
InvalidMultiXactId,
mapped_tables);
+ }
/* Clean up. */
heap_freetuple(reltup1);
@@ -1533,14 +1544,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
newrel = heap_open(OIDOldHeap, NoLock);
if (OidIsValid(newrel->rd_rel->reltoastrelid))
{
- Relation toastrel;
Oid toastidx;
char NewToastName[NAMEDATALEN];
- toastrel = relation_open(newrel->rd_rel->reltoastrelid,
- AccessShareLock);
- toastidx = toastrel->rd_rel->reltoastidxid;
- relation_close(toastrel, AccessShareLock);
+ /* Get the associated valid index to be renamed */
+ toastidx = toast_get_valid_index(newrel->rd_rel->reltoastrelid,
+ AccessShareLock);
/* rename the toast table ... */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u",
@@ -1548,9 +1557,10 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
RenameRelationInternal(newrel->rd_rel->reltoastrelid,
NewToastName, true);
- /* ... and its index too */
+ /* ... and its valid index too. */
snprintf(NewToastName, NAMEDATALEN, "pg_toast_%u_index",
OIDOldHeap);
+
RenameRelationInternal(toastidx,
NewToastName, true);
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6a7aa44..6708725 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -8878,7 +8878,6 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
Relation rel;
Oid oldTableSpace;
Oid reltoastrelid;
- Oid reltoastidxid;
Oid newrelfilenode;
RelFileNode newrnode;
SMgrRelation dstrel;
@@ -8886,6 +8885,8 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
HeapTuple tuple;
Form_pg_class rd_rel;
ForkNumber forkNum;
+ List *reltoastidxids = NIL;
+ ListCell *lc;
/*
* Need lock here in case we are recursing to toast table or index
@@ -8932,7 +8933,13 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
errmsg("cannot move temporary tables of other sessions")));
reltoastrelid = rel->rd_rel->reltoastrelid;
- reltoastidxid = rel->rd_rel->reltoastidxid;
+ /* Fetch the list of indexes on toast relation if necessary */
+ if (OidIsValid(reltoastrelid))
+ {
+ Relation toastRel = relation_open(reltoastrelid, lockmode);
+ reltoastidxids = RelationGetIndexList(toastRel);
+ relation_close(toastRel, lockmode);
+ }
/* Get a modifiable copy of the relation's pg_class row */
pg_class = heap_open(RelationRelationId, RowExclusiveLock);
@@ -9010,11 +9017,14 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
/* Make sure the reltablespace change is visible */
CommandCounterIncrement();
- /* Move associated toast relation and/or index, too */
+ /* Move associated toast relation and/or indexes, too */
if (OidIsValid(reltoastrelid))
ATExecSetTableSpace(reltoastrelid, newTableSpace, lockmode);
- if (OidIsValid(reltoastidxid))
- ATExecSetTableSpace(reltoastidxid, newTableSpace, lockmode);
+ foreach(lc, reltoastidxids)
+ ATExecSetTableSpace(lfirst_oid(lc), newTableSpace, lockmode);
+
+ /* Clean up */
+ list_free(reltoastidxids);
}
/*
diff --git a/src/backend/rewrite/rewriteDefine.c b/src/backend/rewrite/rewriteDefine.c
index 3157aba..92396b3 100644
--- a/src/backend/rewrite/rewriteDefine.c
+++ b/src/backend/rewrite/rewriteDefine.c
@@ -579,8 +579,8 @@ DefineQueryRewrite(char *rulename,
/*
* Fix pg_class entry to look like a normal view's, including setting
- * the correct relkind and removal of reltoastrelid/reltoastidxid of
- * the toast table we potentially removed above.
+ * the correct relkind and removal of reltoastrelid of the toast table
+ * we potentially removed above.
*/
classTup = SearchSysCacheCopy1(RELOID, ObjectIdGetDatum(event_relid));
if (!HeapTupleIsValid(classTup))
@@ -592,7 +592,6 @@ DefineQueryRewrite(char *rulename,
classForm->reltuples = 0;
classForm->relallvisible = 0;
classForm->reltoastrelid = InvalidOid;
- classForm->reltoastidxid = InvalidOid;
classForm->relhasindex = false;
classForm->relkind = RELKIND_VIEW;
classForm->relhasoids = false;
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index 5ddeffe..34482ab 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -332,7 +332,7 @@ pg_relation_size(PG_FUNCTION_ARGS)
}
/*
- * Calculate total on-disk size of a TOAST relation, including its index.
+ * Calculate total on-disk size of a TOAST relation, including its indexes.
* Must not be applied to non-TOAST relations.
*/
static int64
@@ -340,8 +340,9 @@ calculate_toast_table_size(Oid toastrelid)
{
int64 size = 0;
Relation toastRel;
- Relation toastIdxRel;
ForkNumber forkNum;
+ ListCell *lc;
+ List *indexlist;
toastRel = relation_open(toastrelid, AccessShareLock);
@@ -351,12 +352,21 @@ calculate_toast_table_size(Oid toastrelid)
toastRel->rd_backend, forkNum);
/* toast index size, including FSM and VM size */
- toastIdxRel = relation_open(toastRel->rd_rel->reltoastidxid, AccessShareLock);
- for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
- size += calculate_relation_size(&(toastIdxRel->rd_node),
- toastIdxRel->rd_backend, forkNum);
+ indexlist = RelationGetIndexList(toastRel);
- relation_close(toastIdxRel, AccessShareLock);
+ /* Size is calculated using all the indexes available */
+ foreach(lc, indexlist)
+ {
+ Relation toastIdxRel;
+ toastIdxRel = relation_open(lfirst_oid(lc),
+ AccessShareLock);
+ for (forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+ size += calculate_relation_size(&(toastIdxRel->rd_node),
+ toastIdxRel->rd_backend, forkNum);
+
+ relation_close(toastIdxRel, AccessShareLock);
+ }
+ list_free(indexlist);
relation_close(toastRel, AccessShareLock);
return size;
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9ee9ea2..f40961f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -2778,19 +2778,19 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
PQExpBuffer upgrade_query = createPQExpBuffer();
PGresult *upgrade_res;
Oid pg_class_reltoastrelid;
- Oid pg_class_reltoastidxid;
+ Oid pg_index_indexrelid;
appendPQExpBuffer(upgrade_query,
- "SELECT c.reltoastrelid, t.reltoastidxid "
+ "SELECT c.reltoastrelid, i.indexrelid "
"FROM pg_catalog.pg_class c LEFT JOIN "
- "pg_catalog.pg_class t ON (c.reltoastrelid = t.oid) "
+ "pg_catalog.pg_index i ON (c.reltoastrelid = i.indrelid AND i.indisvalid) "
"WHERE c.oid = '%u'::pg_catalog.oid;",
pg_class_oid);
upgrade_res = ExecuteSqlQueryForSingleRow(fout, upgrade_query->data);
pg_class_reltoastrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastrelid")));
- pg_class_reltoastidxid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "reltoastidxid")));
+ pg_index_indexrelid = atooid(PQgetvalue(upgrade_res, 0, PQfnumber(upgrade_res, "indexrelid")));
appendPQExpBuffer(upgrade_buffer,
"\n-- For binary upgrade, must preserve pg_class oids\n");
@@ -2819,7 +2819,7 @@ binary_upgrade_set_pg_class_oids(Archive *fout,
/* every toast table has an index */
appendPQExpBuffer(upgrade_buffer,
"SELECT binary_upgrade.set_next_index_pg_class_oid('%u'::pg_catalog.oid);\n",
- pg_class_reltoastidxid);
+ pg_index_indexrelid);
}
}
else
@@ -13126,7 +13126,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
* attislocal correctly, plus fix up any inherited CHECK constraints.
* Analogously, we set up typed tables using ALTER TABLE / OF here.
*/
- if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
+ if (binary_upgrade && (tbinfo->relkind == RELKIND_RELATION ||
tbinfo->relkind == RELKIND_FOREIGN_TABLE) )
{
for (j = 0; j < tbinfo->numatts; j++)
@@ -13151,7 +13151,7 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
else
appendPQExpBuffer(q, "ALTER FOREIGN TABLE %s ",
fmtId(tbinfo->dobj.name));
-
+
appendPQExpBuffer(q, "DROP COLUMN %s;\n",
fmtId(tbinfo->attnames[j]));
}
diff --git a/src/include/access/tuptoaster.h b/src/include/access/tuptoaster.h
index d0c17fd..b4e0242 100644
--- a/src/include/access/tuptoaster.h
+++ b/src/include/access/tuptoaster.h
@@ -15,6 +15,7 @@
#include "access/htup_details.h"
#include "utils/relcache.h"
+#include "storage/lock.h"
/*
* This enables de-toasting of index entries. Needed until VACUUM is
@@ -193,4 +194,12 @@ extern Size toast_raw_datum_size(Datum value);
*/
extern Size toast_datum_size(Datum value);
+/* ----------
+ * toast_get_valid_index -
+ *
+ * Return OID of valid index associated to a toast relation
+ * ----------
+ */
+extern Oid toast_get_valid_index(Oid toastoid, LOCKMODE lock);
+
#endif /* TUPTOASTER_H */
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index d46fe9e..9358e95 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -53,6 +53,6 @@
*/
/* yyyymmddN */
-#define CATALOG_VERSION_NO 201306121
+#define CATALOG_VERSION_NO 201307031
#endif
diff --git a/src/include/catalog/pg_class.h b/src/include/catalog/pg_class.h
index 2225787..49c4f6f 100644
--- a/src/include/catalog/pg_class.h
+++ b/src/include/catalog/pg_class.h
@@ -48,7 +48,6 @@ CATALOG(pg_class,1259) BKI_BOOTSTRAP BKI_ROWTYPE_OID(83) BKI_SCHEMA_MACRO
int32 relallvisible; /* # of all-visible blocks (not always
* up-to-date) */
Oid reltoastrelid; /* OID of toast table; 0 if none */
- Oid reltoastidxid; /* if toast table, OID of chunk_id index */
bool relhasindex; /* T if has (or has had) any indexes */
bool relisshared; /* T if shared across databases */
char relpersistence; /* see RELPERSISTENCE_xxx constants below */
@@ -94,7 +93,7 @@ typedef FormData_pg_class *Form_pg_class;
* ----------------
*/
-#define Natts_pg_class 29
+#define Natts_pg_class 28
#define Anum_pg_class_relname 1
#define Anum_pg_class_relnamespace 2
#define Anum_pg_class_reltype 3
@@ -107,23 +106,22 @@ typedef FormData_pg_class *Form_pg_class;
#define Anum_pg_class_reltuples 10
#define Anum_pg_class_relallvisible 11
#define Anum_pg_class_reltoastrelid 12
-#define Anum_pg_class_reltoastidxid 13
-#define Anum_pg_class_relhasindex 14
-#define Anum_pg_class_relisshared 15
-#define Anum_pg_class_relpersistence 16
-#define Anum_pg_class_relkind 17
-#define Anum_pg_class_relnatts 18
-#define Anum_pg_class_relchecks 19
-#define Anum_pg_class_relhasoids 20
-#define Anum_pg_class_relhaspkey 21
-#define Anum_pg_class_relhasrules 22
-#define Anum_pg_class_relhastriggers 23
-#define Anum_pg_class_relhassubclass 24
-#define Anum_pg_class_relispopulated 25
-#define Anum_pg_class_relfrozenxid 26
-#define Anum_pg_class_relminmxid 27
-#define Anum_pg_class_relacl 28
-#define Anum_pg_class_reloptions 29
+#define Anum_pg_class_relhasindex 13
+#define Anum_pg_class_relisshared 14
+#define Anum_pg_class_relpersistence 15
+#define Anum_pg_class_relkind 16
+#define Anum_pg_class_relnatts 17
+#define Anum_pg_class_relchecks 18
+#define Anum_pg_class_relhasoids 19
+#define Anum_pg_class_relhaspkey 20
+#define Anum_pg_class_relhasrules 21
+#define Anum_pg_class_relhastriggers 22
+#define Anum_pg_class_relhassubclass 23
+#define Anum_pg_class_relispopulated 24
+#define Anum_pg_class_relfrozenxid 25
+#define Anum_pg_class_relminmxid 26
+#define Anum_pg_class_relacl 27
+#define Anum_pg_class_reloptions 28
/* ----------------
* initial contents of pg_class
@@ -138,13 +136,13 @@ typedef FormData_pg_class *Form_pg_class;
* Note: "3" in the relfrozenxid column stands for FirstNormalTransactionId;
* similarly, "1" in relminmxid stands for FirstMultiXactId
*/
-DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1247 ( pg_type PGNSP 71 0 PGUID 0 0 0 0 0 0 0 f f p r 30 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1249 ( pg_attribute PGNSP 75 0 PGUID 0 0 0 0 0 0 0 f f p r 21 0 f f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1255 ( pg_proc PGNSP 81 0 PGUID 0 0 0 0 0 0 0 f f p r 27 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
-DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 0 f f p r 29 0 t f f f f t 3 1 _null_ _null_ ));
+DATA(insert OID = 1259 ( pg_class PGNSP 83 0 PGUID 0 0 0 0 0 0 0 f f p r 28 0 t f f f f t 3 1 _null_ _null_ ));
DESCR("");
diff --git a/src/test/regress/expected/oidjoins.out b/src/test/regress/expected/oidjoins.out
index 06ed856..6c5cb5a 100644
--- a/src/test/regress/expected/oidjoins.out
+++ b/src/test/regress/expected/oidjoins.out
@@ -353,14 +353,6 @@ WHERE reltoastrelid != 0 AND
------+---------------
(0 rows)
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
- ctid | reltoastidxid
-------+---------------
-(0 rows)
-
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 57ae842..4b182e7 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1852,15 +1852,15 @@ SELECT viewname, definition FROM pg_views WHERE schemaname <> 'information_schem
| (sum(pg_stat_get_blocks_hit(i.indexrelid)))::bigint AS idx_blks_hit, +
| (pg_stat_get_blocks_fetched(t.oid) - pg_stat_get_blocks_hit(t.oid)) AS toast_blks_read, +
| pg_stat_get_blocks_hit(t.oid) AS toast_blks_hit, +
- | (pg_stat_get_blocks_fetched(x.oid) - pg_stat_get_blocks_hit(x.oid)) AS tidx_blks_read, +
- | pg_stat_get_blocks_hit(x.oid) AS tidx_blks_hit +
+ | (sum((pg_stat_get_blocks_fetched(x.indexrelid) - pg_stat_get_blocks_hit(x.indexrelid))))::bigint AS tidx_blks_read, +
+ | (sum(pg_stat_get_blocks_hit(x.indexrelid)))::bigint AS tidx_blks_hit +
| FROM ((((pg_class c +
| LEFT JOIN pg_index i ON ((c.oid = i.indrelid))) +
| LEFT JOIN pg_class t ON ((c.reltoastrelid = t.oid))) +
- | LEFT JOIN pg_class x ON ((t.reltoastidxid = x.oid))) +
+ | LEFT JOIN pg_index x ON ((t.oid = x.indrelid))) +
| LEFT JOIN pg_namespace n ON ((n.oid = c.relnamespace))) +
| WHERE (c.relkind = ANY (ARRAY['r'::"char", 't'::"char", 'm'::"char"])) +
- | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.oid;
+ | GROUP BY c.oid, n.nspname, c.relname, t.oid, x.indrelid;
pg_statio_sys_indexes | SELECT pg_statio_all_indexes.relid, +
| pg_statio_all_indexes.indexrelid, +
| pg_statio_all_indexes.schemaname, +
@@ -2347,11 +2347,11 @@ select xmin, * from fooview; -- fail, views don't have such a column
ERROR: column "xmin" does not exist
LINE 1: select xmin, * from fooview;
^
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
- reltoastrelid | reltoastidxid | relkind | relfrozenxid
----------------+---------------+---------+--------------
- 0 | 0 | v | 0
+ reltoastrelid | relkind | relfrozenxid
+---------------+---------+--------------
+ 0 | v | 0
(1 row)
drop view fooview;
diff --git a/src/test/regress/sql/oidjoins.sql b/src/test/regress/sql/oidjoins.sql
index 6422da2..9b91683 100644
--- a/src/test/regress/sql/oidjoins.sql
+++ b/src/test/regress/sql/oidjoins.sql
@@ -177,10 +177,6 @@ SELECT ctid, reltoastrelid
FROM pg_catalog.pg_class fk
WHERE reltoastrelid != 0 AND
NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastrelid);
-SELECT ctid, reltoastidxid
-FROM pg_catalog.pg_class fk
-WHERE reltoastidxid != 0 AND
- NOT EXISTS(SELECT 1 FROM pg_catalog.pg_class pk WHERE pk.oid = fk.reltoastidxid);
SELECT ctid, collnamespace
FROM pg_catalog.pg_collation fk
WHERE collnamespace != 0 AND
diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql
index d5a3571..6361297 100644
--- a/src/test/regress/sql/rules.sql
+++ b/src/test/regress/sql/rules.sql
@@ -872,7 +872,7 @@ create rule "_RETURN" as on select to fooview do instead
select * from fooview;
select xmin, * from fooview; -- fail, views don't have such a column
-select reltoastrelid, reltoastidxid, relkind, relfrozenxid
+select reltoastrelid, relkind, relfrozenxid
from pg_class where oid = 'fooview'::regclass;
drop view fooview;
diff --git a/src/tools/findoidjoins/README b/src/tools/findoidjoins/README
index b5c4d1b..e3e8a2a 100644
--- a/src/tools/findoidjoins/README
+++ b/src/tools/findoidjoins/README
@@ -86,7 +86,6 @@ Join pg_catalog.pg_class.relowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_class.relam => pg_catalog.pg_am.oid
Join pg_catalog.pg_class.reltablespace => pg_catalog.pg_tablespace.oid
Join pg_catalog.pg_class.reltoastrelid => pg_catalog.pg_class.oid
-Join pg_catalog.pg_class.reltoastidxid => pg_catalog.pg_class.oid
Join pg_catalog.pg_collation.collnamespace => pg_catalog.pg_namespace.oid
Join pg_catalog.pg_collation.collowner => pg_catalog.pg_authid.oid
Join pg_catalog.pg_constraint.connamespace => pg_catalog.pg_namespace.oid
On Thu, Jul 4, 2013 at 2:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?+1
I changed the query that way. Updated version of the patch attached.Also I updated the rules.out because Michael changed the system_views.sql.
Otherwise, the regression test would fail.Will commit this patch.
Committed. So, let's get to REINDEX CONCURRENTLY patch!
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Jul 4, 2013 at 3:26 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Jul 4, 2013 at 2:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
On Thu, Jul 4, 2013 at 2:36 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-07-04 02:32:32 +0900, Michael Paquier wrote:
Wouldn't it make more sense to fetch the toast index oid in the query
ontop instead of making a query for every relation?+1
I changed the query that way. Updated version of the patch attached.Also I updated the rules.out because Michael changed the system_views.sql.
Otherwise, the regression test would fail.Will commit this patch.
Committed. So, let's get to REINDEX CONCURRENTLY patch!
Thanks for the hard work! I'll work on something based on MVCC
catalogs, so at least lock will be lowered at swap phase and isolation
tests will be added.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I noticed some errors in the comments of the patch committed. Please
find attached a patch to correct that.
Regards,
--
Michael
Attachments:
20130704_reltoastidxid_comments.patchapplication/octet-stream; name=20130704_reltoastidxid_comments.patchDownload
diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 675bfcc..c76dc24 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1374,7 +1374,7 @@ toast_save_datum(Relation rel, Datum value,
toastrel = heap_open(rel->rd_rel->reltoastrelid, RowExclusiveLock);
toasttupDesc = toastrel->rd_att;
- /* Open all the toast indexes and look for the valid */
+ /* Open all the toast indexes and look for the valid one */
validIndex = toast_open_indexes(toastrel,
RowExclusiveLock,
&toastidxs,
@@ -1546,7 +1546,7 @@ toast_save_datum(Relation rel, Datum value,
*/
for (i = 0; i < num_indexes; i++)
{
- /* Only index relations marked as ready can updated */
+ /* Only index relations marked as ready can be updated */
if (IndexIsReady(toastidxs[i]->rd_index))
index_insert(toastidxs[i], t_values, t_isnull,
&(toasttup->t_self),
On Thu, Jul 4, 2013 at 3:38 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Hi,
I noticed some errors in the comments of the patch committed. Please
find attached a patch to correct that.
Committed. Thanks!
Regards,
--
Fujii Masao
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi all,
Please find attached the patch using MVCC catalogs. I have split the
previous core patch into 3 pieces to facilitate the review and reduce
the size of the main patch as the previous core patch contained a lot
of code refactoring.
0) 20130705_0_procarray.patch, this patch adds a set of generic APIs
in procarray.c that can be used to wait for snapshots older than a
given xmin, or to wait for some virtual locks. This code has been
taken from CREATE/DROP INDEX CONCURRENTLY, and I think that this set
of APIs could be used for the implementation os other concurrent DDLs.
1) 20130705_1_index_conc_struct.patch, this patch refactors a bit
CREATE/DROP INDEX CONCURRENTLY to create 2 generic APIs for the build
of a concurrent index, and the step where it is set as dead.
2) 20130705_2_reindex_concurrently_v28.patch, with the core feature. I
have added some stuff here:
- isolation tests, (perhaps it would be better to make the DML actions
last longer in those tests?)
- reduction of the lock used at swap phase from AccessExclusiveLock to
ShareUpdateExclusiveLock, and added a wait before commit of swap phase
for old snapshots at the end of swap phase to be sure that no
transactions will use the old relfilenode that has been swapped after
commit
- doc update
- simplified some APIs, like the removal of index_concurrent_clear_valid
- fixed a bug where it was not possible to reindex concurrently a toast relation
Patch 1 depends on 0, Patch 2 depends on 1 and 0. Patch 0 can be
applied directly on master.
The two first patches are pretty simple, patch 0 could even be quickly
reviewed and approved to provide some more infrastructure that could
be possibly used by some other patches around, like REFRESH
CONCURRENTLY...
I have also done some tests with the set of patches:
- Manual testing, and checked that process went smoothly by taking
some manual checkpoints during each phase of REINDEX CONCURRENTLY
- Ran make check for regression and isolation tests
- Ran make installcheck, and then REINDEX DATABASE CONCURRENTLY on the
database regression that remained on server
Regards,
--
Michael
Attachments:
20130705_0_procarray.patchapplication/octet-stream; name=20130705_0_procarray.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8525cb9..2a37cf2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1325,7 +1325,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1447,11 +1446,8 @@ index_drop(Oid indexId, bool concurrent)
/*
* Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
+ * index for a query. Note we do not need to worry about xacts that
+ * open the table for reading after this point; they will see the
* index as invalid when they open the relation.
*
* Note: the reason we use actual lock acquisition here, rather than
@@ -1459,18 +1455,8 @@ index_drop(Oid indexId, bool concurrent)
* possible if one of the transactions in question is blocked trying
* to acquire an exclusive lock on our table. The lock code will
* detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* No more predicate locks will be acquired on this index, and we're
@@ -1514,13 +1500,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 9d9745e..375a519 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -321,13 +321,9 @@ DefineIndex(IndexStmt *stmt,
IndexInfo *indexInfo;
int numberOfAttributes;
TransactionId limitXmin;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -652,10 +648,7 @@ DefineIndex(IndexStmt *stmt,
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
- * with the old list of indexes. To do this, inquire which xacts
- * currently would conflict with ShareLock on the table -- ie, which ones
- * have a lock that permits writing the table. Then wait for each of
- * these xacts to commit or abort. Note we do not need to worry about
+ * with the old list of indexes. Note we do not need to worry about
* xacts that open the table for writing after this point; they will see
* the new index when they open it.
*
@@ -664,18 +657,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -739,13 +722,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -786,74 +763,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(limitXmin);
/*
* Index can now be marked valid -- update its pg_index entry
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c2f86ff..ac1f3ec 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2567,6 +2567,153 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKMODE -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ /* Clean up */
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock tag.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given xmin
+ * limit, because it might not contain tuples deleted just before it has
+ * been taken. Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin given;
+ * their oldest snapshot must be newer than our xmin limit.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(TransactionId limitXmin)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c5f58b4..4df51b0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -77,4 +77,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(TransactionId limitXmin);
+
#endif /* PROCARRAY_H */
20130705_1_index_conc_struct.patchapplication/octet-stream; name=20130705_1_index_conc_struct.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2a37cf2..db5917b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1091,6 +1091,100 @@ index_create(Relation heapRelation,
}
/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel, indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
+{
+ Relation heapRelation, indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexOid, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
@@ -1444,50 +1538,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. Note we do not need to worry about xacts that
- * open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- */
- WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(heapId, indexId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 375a519..5d0815c 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -311,7 +311,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -678,27 +677,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..9f29003 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -62,6 +62,14 @@ extern Oid index_create(Relation heapRelation,
bool concurrent,
bool is_internal);
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_set_dead(Oid heapOid,
+ Oid indexOid,
+ LOCKTAG locktag);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
20130705_2_reindex_concurrently_v28.patchapplication/octet-stream; name=20130705_2_reindex_concurrently_v28.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 316add7..f454caa 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..5f42c4f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,22 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>,
+ including invalid toast indexes.
</para>
</listitem>
@@ -139,6 +152,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +259,115 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped. Finally two additional transactions
+ are used to mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Valid indexes, being unique
+ for a given toast relation, cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +399,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index db5917b..dd192cb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1090,6 +1100,190 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with existing names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, NoLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
/*
* index_concurrent_build
*
@@ -1128,6 +1322,65 @@ index_concurrent_build(Oid heapOid,
index_close(indexRelation, NoLock);
}
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. Relations do not require an exclusive lock thanks to the
+ * MVCC catalog access to relcache.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take a necessary lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, ShareUpdateExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, ShareUpdateExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swap happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
/*
* index_concurrent_set_dead
*
@@ -1185,6 +1438,71 @@ index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
}
/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 5d0815c..67d5576 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -449,7 +450,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -596,7 +598,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -777,6 +779,537 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+ TransactionId limitXmin;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ case RELKIND_TOASTVALUE:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation = heap_open(relationOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* Relation on which is based index cannot be shared */
+ if (heapRelation->rd_rel->relisshared)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * We can now do away with our active snapshot, we still need to save the xmin
+ * limit to wait for older snapshots.
+ */
+ limitXmin = snapshot->xmin;
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(limitXmin);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes have been validated could be used,
+ * we need to swap each concurrent index with its corresponding old index.
+ * Note that the concurrent index used for swaping is not marked as valid
+ * because we need to keep the former index and the concurrent index with
+ * a different valid status to avoid an implosion in the number of indexes
+ * a parent relation could have if this operation fails multiple times in
+ * a row due to a reason or another. Note that we already know thanks to
+ * validation step that
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ relOid = IndexGetRelation(indOid, false);
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* We need the xmin limit to wait for older snapshots. */
+ snapshot = GetTransactionSnapshot();
+ limitXmin = snapshot->xmin;
+
+ /*
+ * We need to wait for transactions that might need the older index
+ * information before swap before committing.
+ */
+ WaitForOldSnapshots(limitXmin);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(relOid, indOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1439,7 +1972,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1465,6 +1999,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1577,18 +2118,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1657,13 +2202,27 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1682,7 +2241,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1694,6 +2256,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1777,15 +2348,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 6708725..e0a9ce2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -903,6 +903,38 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
if (classform->relkind != relkind)
DropErrorMsgWrongType(rel->relname, classform->relkind, relkind);
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(state->heapOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Leave if index entry is not valid */
+ if (!indisvalid)
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+ }
+
/* Allow DROP to either table owner or schema owner */
if (!pg_class_ownercheck(relOid, GetUserId()) &&
!pg_namespace_ownercheck(classform->relnamespace, GetUserId()))
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cf7fb72..46ddcba 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1201,6 +1201,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index b5b8d63..64abdde 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3f96595..65f2279 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1839,6 +1839,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f67ef0c..cf1bae5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6769,29 +6769,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index c940897..abac9eb 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -778,16 +778,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -799,8 +803,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9f29003..ab45c67 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,16 +60,25 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
extern void index_concurrent_build(Oid heapOid,
Oid indexOid,
bool isprimary);
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
extern void index_concurrent_set_dead(Oid heapOid,
Oid indexOid,
LOCKTAG locktag);
+extern void index_concurrent_drop(Oid indexOid);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index fa9f41f..0b965da 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index de22dff..904fff4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2544,6 +2544,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/test/isolation/expected/reindex-concurrently.out b/src/test/isolation/expected/reindex-concurrently.out
new file mode 100644
index 0000000..9e04169
--- /dev/null
+++ b/src/test/isolation/expected/reindex-concurrently.out
@@ -0,0 +1,78 @@
+Parsed test spec with 3 sessions
+
+starting permutation: reindex sel1 upd2 ins2 del2 end1 end2
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab;
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+
+starting permutation: sel1 reindex upd2 ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 reindex ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 reindex del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 reindex end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 end1 reindex end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end2: COMMIT;
+step reindex: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 081e11f..fb4c1a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -20,4 +20,5 @@ test: delete-abort-savept
test: delete-abort-savept-2
test: aborted-keyrevoke
test: drop-index-concurrently-1
+test: reindex-concurrently
test: timeouts
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..eb59fe0
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE CONCURRENTLY reind_con_tab; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 37dea0a..47ccc0e 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,58 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index d025cbc..383faa5 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,43 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Hi all,
I am resending the patches after Fujii-san noticed a bug allowing to
even drop valid toast indexes with the latest code... While looking at
that, I found a couple of other bugs:
- two bugs, now fixed, with the code path added in tablecmds.c to
allow the manual drop of invalid toast indexes:
-- Even a user having no permission on the parent toast table could
drop an invalid toast index
-- A lock on the parent toast relation was not taken as it is the case
for all the indexes dropped with DROP INDEX
- Trying to reindex concurrently a mapped catalog leads to an error.
As they have no relfilenode, I think it makes sense to block reindex
concurrently in this case, so I modified the core patch in this sense.
Regards,
On Fri, Jul 5, 2013 at 1:47 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
Hi all,
Please find attached the patch using MVCC catalogs. I have split the
previous core patch into 3 pieces to facilitate the review and reduce
the size of the main patch as the previous core patch contained a lot
of code refactoring.
0) 20130705_0_procarray.patch, this patch adds a set of generic APIs
in procarray.c that can be used to wait for snapshots older than a
given xmin, or to wait for some virtual locks. This code has been
taken from CREATE/DROP INDEX CONCURRENTLY, and I think that this set
of APIs could be used for the implementation os other concurrent DDLs.
1) 20130705_1_index_conc_struct.patch, this patch refactors a bit
CREATE/DROP INDEX CONCURRENTLY to create 2 generic APIs for the build
of a concurrent index, and the step where it is set as dead.
2) 20130705_2_reindex_concurrently_v28.patch, with the core feature. I
have added some stuff here:
- isolation tests, (perhaps it would be better to make the DML actions
last longer in those tests?)
- reduction of the lock used at swap phase from AccessExclusiveLock to
ShareUpdateExclusiveLock, and added a wait before commit of swap phase
for old snapshots at the end of swap phase to be sure that no
transactions will use the old relfilenode that has been swapped after
commit
- doc update
- simplified some APIs, like the removal of index_concurrent_clear_valid
- fixed a bug where it was not possible to reindex concurrently a toast relation
Patch 1 depends on 0, Patch 2 depends on 1 and 0. Patch 0 can be
applied directly on master.The two first patches are pretty simple, patch 0 could even be quickly
reviewed and approved to provide some more infrastructure that could
be possibly used by some other patches around, like REFRESH
CONCURRENTLY...I have also done some tests with the set of patches:
- Manual testing, and checked that process went smoothly by taking
some manual checkpoints during each phase of REINDEX CONCURRENTLY
- Ran make check for regression and isolation tests
- Ran make installcheck, and then REINDEX DATABASE CONCURRENTLY on the
database regression that remained on serverRegards,
--
Michael
--
Michael
Attachments:
20130711_0_procarray.patchapplication/octet-stream; name=20130711_0_procarray.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 8525cb9..2a37cf2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1325,7 +1325,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1447,11 +1446,8 @@ index_drop(Oid indexId, bool concurrent)
/*
* Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
+ * index for a query. Note we do not need to worry about xacts that
+ * open the table for reading after this point; they will see the
* index as invalid when they open the relation.
*
* Note: the reason we use actual lock acquisition here, rather than
@@ -1459,18 +1455,8 @@ index_drop(Oid indexId, bool concurrent)
* possible if one of the transactions in question is blocked trying
* to acquire an exclusive lock on our table. The lock code will
* detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* No more predicate locks will be acquired on this index, and we're
@@ -1514,13 +1500,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ec8f248..e913475 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -321,13 +321,9 @@ DefineIndex(IndexStmt *stmt,
IndexInfo *indexInfo;
int numberOfAttributes;
TransactionId limitXmin;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -652,10 +648,7 @@ DefineIndex(IndexStmt *stmt,
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
- * with the old list of indexes. To do this, inquire which xacts
- * currently would conflict with ShareLock on the table -- ie, which ones
- * have a lock that permits writing the table. Then wait for each of
- * these xacts to commit or abort. Note we do not need to worry about
+ * with the old list of indexes. Note we do not need to worry about
* xacts that open the table for writing after this point; they will see
* the new index when they open it.
*
@@ -664,18 +657,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -739,13 +722,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -786,74 +763,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(limitXmin);
/*
* Index can now be marked valid -- update its pg_index entry
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c2f86ff..ac1f3ec 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2567,6 +2567,153 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKMODE -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ /* Clean up */
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock tag.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given xmin
+ * limit, because it might not contain tuples deleted just before it has
+ * been taken. Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin given;
+ * their oldest snapshot must be newer than our xmin limit.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(TransactionId limitXmin)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c5f58b4..4df51b0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -77,4 +77,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(TransactionId limitXmin);
+
#endif /* PROCARRAY_H */
20130711_1_index_conc_struct.patchapplication/octet-stream; name=20130711_1_index_conc_struct.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 2a37cf2..db5917b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1091,6 +1091,100 @@ index_create(Relation heapRelation,
}
/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel, indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
+{
+ Relation heapRelation, indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexOid, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
@@ -1444,50 +1538,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. Note we do not need to worry about xacts that
- * open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- */
- WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(heapId, indexId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index e913475..4ed9812 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -311,7 +311,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -678,27 +677,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..9f29003 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -62,6 +62,14 @@ extern Oid index_create(Relation heapRelation,
bool concurrent,
bool is_internal);
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_set_dead(Oid heapOid,
+ Oid indexOid,
+ LOCKTAG locktag);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
20130711_2_reindex_concurrently_v29.patchapplication/octet-stream; name=20130711_2_reindex_concurrently_v29.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index 316add7..f454caa 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..5f42c4f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,22 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>,
+ including invalid toast indexes.
</para>
</listitem>
@@ -139,6 +152,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +259,115 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped. Finally two additional transactions
+ are used to mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Valid indexes, being unique
+ for a given toast relation, cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +399,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index db5917b..dd192cb 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1090,6 +1100,190 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with existing names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, NoLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
/*
* index_concurrent_build
*
@@ -1128,6 +1322,65 @@ index_concurrent_build(Oid heapOid,
index_close(indexRelation, NoLock);
}
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. Relations do not require an exclusive lock thanks to the
+ * MVCC catalog access to relcache.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take a necessary lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, ShareUpdateExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, ShareUpdateExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swap happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
/*
* index_concurrent_set_dead
*
@@ -1185,6 +1438,71 @@ index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
}
/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 4ed9812..b921e65 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -449,7 +450,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -596,7 +598,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -777,6 +779,545 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+ TransactionId limitXmin;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ case RELKIND_TOASTVALUE:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation;
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* A shared relation cannot be reindexed concurrently */
+ if (IsSharedRelation(relationOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* A system catalog cannot be reindexed concurrently */
+ if (IsSystemNamespace(get_rel_namespace(relationOid)))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for catalog relations")));
+
+ /* Open relation to get its indexes */
+ heapRelation = heap_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * We can now do away with our active snapshot, we still need to save the xmin
+ * limit to wait for older snapshots.
+ */
+ limitXmin = snapshot->xmin;
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOldSnapshots(limitXmin);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes have been validated could be used,
+ * we need to swap each concurrent index with its corresponding old index.
+ * Note that the concurrent index used for swaping is not marked as valid
+ * because we need to keep the former index and the concurrent index with
+ * a different valid status to avoid an implosion in the number of indexes
+ * a parent relation could have if this operation fails multiple times in
+ * a row due to a reason or another. Note that we already know thanks to
+ * validation step that
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ relOid = IndexGetRelation(indOid, false);
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* We need the xmin limit to wait for older snapshots. */
+ snapshot = GetTransactionSnapshot();
+ limitXmin = snapshot->xmin;
+
+ /*
+ * We need to wait for transactions that might need the older index
+ * information before swap before committing.
+ */
+ WaitForOldSnapshots(limitXmin);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(relOid, indOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1439,7 +1980,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1465,6 +2007,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1577,18 +2126,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1657,13 +2210,27 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid, REINDEX_REL_PROCESS_TOAST))
ereport(NOTICE,
@@ -1682,7 +2249,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1694,6 +2264,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1777,15 +2356,40 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid, REINDEX_REL_PROCESS_TOAST))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid, REINDEX_REL_PROCESS_TOAST);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index f56ef28..2e78124 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -874,6 +874,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
char relkind;
Form_pg_class classform;
LOCKMODE heap_lockmode;
+ bool invalid_system_index = false;
state = (struct DropRelationCallbackState *) arg;
relkind = state->relkind;
@@ -909,7 +910,37 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
rel->relname);
- if (!allowSystemTableMods && IsSystemClass(classform))
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(relOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Mark object as being an invalid index of system catalogs */
+ if (!indisvalid)
+ invalid_system_index = true;
+ }
+
+ /* In the case of an invalid index, it is fine to bypass this check */
+ if (!invalid_system_index && !allowSystemTableMods && IsSystemClass(classform))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied: \"%s\" is a system catalog",
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index cf7fb72..46ddcba 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1201,6 +1201,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index b5b8d63..64abdde 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3617,6 +3617,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 3f96595..65f2279 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1839,6 +1839,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index f67ef0c..cf1bae5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6769,29 +6769,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index c940897..abac9eb 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -778,16 +778,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -799,8 +803,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9f29003..ab45c67 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,16 +60,25 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
extern void index_concurrent_build(Oid heapOid,
Oid indexOid,
bool isprimary);
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
extern void index_concurrent_set_dead(Oid heapOid,
Oid indexOid,
LOCKTAG locktag);
+extern void index_concurrent_drop(Oid indexOid);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index fa9f41f..0b965da 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index de22dff..904fff4 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2544,6 +2544,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/test/isolation/expected/reindex-concurrently.out b/src/test/isolation/expected/reindex-concurrently.out
new file mode 100644
index 0000000..9e04169
--- /dev/null
+++ b/src/test/isolation/expected/reindex-concurrently.out
@@ -0,0 +1,78 @@
+Parsed test spec with 3 sessions
+
+starting permutation: reindex sel1 upd2 ins2 del2 end1 end2
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab;
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+
+starting permutation: sel1 reindex upd2 ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 reindex ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 reindex del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 reindex end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 end1 reindex end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end2: COMMIT;
+step reindex: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 081e11f..fb4c1a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -20,4 +20,5 @@ test: delete-abort-savept
test: delete-abort-savept-2
test: aborted-keyrevoke
test: drop-index-concurrently-1
+test: reindex-concurrently
test: timeouts
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..eb59fe0
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE CONCURRENTLY reind_con_tab; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 37dea0a..dcd1dc8 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2721,3 +2721,60 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+ERROR: concurrent reindex is not supported for catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index d025cbc..d6a5ad5 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -912,3 +912,44 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On Thu, Jul 11, 2013 at 5:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
I am resending the patches after Fujii-san noticed a bug allowing to
even drop valid toast indexes with the latest code... While looking at
that, I found a couple of other bugs:
- two bugs, now fixed, with the code path added in tablecmds.c to
allow the manual drop of invalid toast indexes:
-- Even a user having no permission on the parent toast table could
drop an invalid toast index
-- A lock on the parent toast relation was not taken as it is the case
for all the indexes dropped with DROP INDEX
- Trying to reindex concurrently a mapped catalog leads to an error.
As they have no relfilenode, I think it makes sense to block reindex
concurrently in this case, so I modified the core patch in this sense.
This patch status has been changed to returned with feedback.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
I have been working a little bit more on this patch for the next
commit fest. Compared to the previous version, I have removed the part
of the code where process running REINDEX CONCURRENTLY was waiting for
transactions holding a snapshot older than the snapshot xmin of
process running REINDEX CONCURRENTLY at the validation and swap phase.
At the validation phase, there was a risk that the newly-validated
index might not contain deleted tuples before the snapshot used for
validation was taken. I tried to break the code in this area by
playing with multiple sessions but couldn't. Feel free to try the code
and break it if you can!
At the swap phase, the process running REINDEX CONCURRENTLY needed to
wait for transactions that might have needed the older index
information being swapped. As swap phase is done with an MVCC
snapshot, this is not necessary anymore.
Thanks to the removal of this code, I am not seeing anymore with this
patch deadlocks that could occur when other sessions tried to take a
ShareUpdateExclusiveLock on a relation with an ANALYZE for example. So
multiple backends can kick in parallel REINDEX CONCURRENTLY or ANALYZE
commands without risks of deadlock. Processes will just wait for locks
as long as necessary.
Regards,
--
Michael
Attachments:
20130827_0_procarray.patchapplication/octet-stream; name=20130827_0_procarray.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b73ee4f..6f44cb2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1323,7 +1323,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1445,11 +1444,8 @@ index_drop(Oid indexId, bool concurrent)
/*
* Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
+ * index for a query. Note we do not need to worry about xacts that
+ * open the table for reading after this point; they will see the
* index as invalid when they open the relation.
*
* Note: the reason we use actual lock acquisition here, rather than
@@ -1457,18 +1453,8 @@ index_drop(Oid indexId, bool concurrent)
* possible if one of the transactions in question is blocked trying
* to acquire an exclusive lock on our table. The lock code will
* detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* No more predicate locks will be acquired on this index, and we're
@@ -1512,13 +1498,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 902daa0..26884b1 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -321,13 +321,9 @@ DefineIndex(IndexStmt *stmt,
IndexInfo *indexInfo;
int numberOfAttributes;
TransactionId limitXmin;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -652,10 +648,7 @@ DefineIndex(IndexStmt *stmt,
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
- * with the old list of indexes. To do this, inquire which xacts
- * currently would conflict with ShareLock on the table -- ie, which ones
- * have a lock that permits writing the table. Then wait for each of
- * these xacts to commit or abort. Note we do not need to worry about
+ * with the old list of indexes. Note we do not need to worry about
* xacts that open the table for writing after this point; they will see
* the new index when they open it.
*
@@ -664,18 +657,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -739,13 +722,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -786,74 +763,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(limitXmin);
/*
* Index can now be marked valid -- update its pg_index entry
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c2f86ff..ac1f3ec 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2567,6 +2567,153 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKMODE -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ /* Clean up */
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock tag.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given xmin
+ * limit, because it might not contain tuples deleted just before it has
+ * been taken. Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin given;
+ * their oldest snapshot must be newer than our xmin limit.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(TransactionId limitXmin)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c5f58b4..4df51b0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -77,4 +77,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(TransactionId limitXmin);
+
#endif /* PROCARRAY_H */
20130827_1_index_conc_struc.patchapplication/octet-stream; name=20130827_1_index_conc_struc.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 6f44cb2..aed57f0 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1091,6 +1091,100 @@ index_create(Relation heapRelation,
}
/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel, indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
+{
+ Relation heapRelation, indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexOid, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
@@ -1442,50 +1536,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. Note we do not need to worry about xacts that
- * open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- */
- WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(heapId, indexId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 26884b1..29d7eea 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -311,7 +311,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -678,27 +677,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..9f29003 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -62,6 +62,14 @@ extern Oid index_create(Relation heapRelation,
bool concurrent,
bool is_internal);
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_set_dead(Oid heapOid,
+ Oid indexOid,
+ LOCKTAG locktag);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
20130827_2_reindex_concurrently_v30.patchapplication/octet-stream; name=20130827_2_reindex_concurrently_v30.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index cefd323..2d7678b 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..5f42c4f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,22 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>,
+ including invalid toast indexes.
</para>
</listitem>
@@ -139,6 +152,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +259,115 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped. Finally two additional transactions
+ are used to mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Valid indexes, being unique
+ for a given toast relation, cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +399,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index aed57f0..461d685 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1090,6 +1100,190 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with existing names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, NoLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
/*
* index_concurrent_build
*
@@ -1128,6 +1322,65 @@ index_concurrent_build(Oid heapOid,
index_close(indexRelation, NoLock);
}
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. Relations do not require an exclusive lock thanks to the
+ * MVCC catalog access to relcache.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take a necessary lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, ShareUpdateExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, ShareUpdateExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swap happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
/*
* index_concurrent_set_dead
*
@@ -1185,6 +1438,71 @@ index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
}
/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 29d7eea..1b4f001 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -449,7 +450,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -596,7 +598,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -777,6 +779,520 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ case RELKIND_TOASTVALUE:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation;
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* A shared relation cannot be reindexed concurrently */
+ if (IsSharedRelation(relationOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* A system catalog cannot be reindexed concurrently */
+ if (IsSystemNamespace(get_rel_namespace(relationOid)))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for catalog relations")));
+
+ /* Open relation to get its indexes */
+ heapRelation = heap_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /* And we can remove the validating snapshot too */
+ PopActiveSnapshot();
+ UnregisterSnapshot(snapshot);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes have been validated could be used,
+ * we need to swap each concurrent index with its corresponding old index.
+ * Note that the concurrent index used for swaping is not marked as valid
+ * because we need to keep the former index and the concurrent index with
+ * a different valid status to avoid an implosion in the number of indexes
+ * a parent relation could have if this operation fails multiple times in
+ * a row due to a reason or another. Note that we already know thanks to
+ * validation step that
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ relOid = IndexGetRelation(indOid, false);
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(relOid, indOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1439,7 +1955,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1465,6 +1982,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1577,18 +2101,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ false, false,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1657,13 +2185,27 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ false, false,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid,
REINDEX_REL_PROCESS_TOAST |
@@ -1684,7 +2226,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1696,6 +2241,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1779,17 +2333,42 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid,
- REINDEX_REL_PROCESS_TOAST |
- REINDEX_REL_CHECK_CONSTRAINTS))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid,
+ REINDEX_REL_PROCESS_TOAST |
+ REINDEX_REL_CHECK_CONSTRAINTS);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index adc74dd..8aee327 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -874,6 +874,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
char relkind;
Form_pg_class classform;
LOCKMODE heap_lockmode;
+ bool invalid_system_index = false;
state = (struct DropRelationCallbackState *) arg;
relkind = state->relkind;
@@ -909,7 +910,37 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
rel->relname);
- if (!allowSystemTableMods && IsSystemClass(classform))
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(relOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Mark object as being an invalid index of system catalogs */
+ if (!indisvalid)
+ invalid_system_index = true;
+ }
+
+ /* In the case of an invalid index, it is fine to bypass this check */
+ if (!invalid_system_index && !allowSystemTableMods && IsSystemClass(classform))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied: \"%s\" is a system catalog",
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..5495f22 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1201,6 +1201,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 788907e..15f38a3 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3639,6 +3639,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 496e31d..38d32cc 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1848,6 +1848,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 22e82ba..b99652b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6770,29 +6770,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index c940897..abac9eb 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -778,16 +778,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -799,8 +803,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9f29003..ab45c67 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,16 +60,25 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
extern void index_concurrent_build(Oid heapOid,
Oid indexOid,
bool isprimary);
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
extern void index_concurrent_set_dead(Oid heapOid,
Oid indexOid,
LOCKTAG locktag);
+extern void index_concurrent_drop(Oid indexOid);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index fa9f41f..0b965da 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -26,10 +26,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 51fef68..4cde473 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2587,6 +2587,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/test/isolation/expected/reindex-concurrently.out b/src/test/isolation/expected/reindex-concurrently.out
new file mode 100644
index 0000000..9e04169
--- /dev/null
+++ b/src/test/isolation/expected/reindex-concurrently.out
@@ -0,0 +1,78 @@
+Parsed test spec with 3 sessions
+
+starting permutation: reindex sel1 upd2 ins2 del2 end1 end2
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab;
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+
+starting permutation: sel1 reindex upd2 ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 reindex ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 reindex del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 reindex end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 end1 reindex end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end2: COMMIT;
+step reindex: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 081e11f..fb4c1a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -20,4 +20,5 @@ test: delete-abort-savept
test: delete-abort-savept-2
test: aborted-keyrevoke
test: drop-index-concurrently-1
+test: reindex-concurrently
test: timeouts
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..eb59fe0
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE CONCURRENTLY reind_con_tab; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 81c64e5..a613227 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2741,3 +2741,60 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+ERROR: concurrent reindex is not supported for catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 4ee8581..aacfb58 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -915,3 +915,44 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
I have been working a little bit more on this patch for the next
commit fest. Compared to the previous version, I have removed the part
of the code where process running REINDEX CONCURRENTLY was waiting for
transactions holding a snapshot older than the snapshot xmin of
process running REINDEX CONCURRENTLY at the validation and swap phase.
At the validation phase, there was a risk that the newly-validated
index might not contain deleted tuples before the snapshot used for
validation was taken. I tried to break the code in this area by
playing with multiple sessions but couldn't. Feel free to try the code
and break it if you can!
Hm. Do you have any justifications for removing those waits besides "I
couldn't break it"? The logic for the concurrent indexing is pretty
intricate and we've got it wrong a couple of times without noticing bugs
for a long while, so I am really uncomfortable with statements like this.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
I have been working a little bit more on this patch for the next
commit fest. Compared to the previous version, I have removed the part
of the code where process running REINDEX CONCURRENTLY was waiting for
transactions holding a snapshot older than the snapshot xmin of
process running REINDEX CONCURRENTLY at the validation and swap phase.
At the validation phase, there was a risk that the newly-validated
index might not contain deleted tuples before the snapshot used for
validation was taken. I tried to break the code in this area by
playing with multiple sessions but couldn't. Feel free to try the code
and break it if you can!Hm. Do you have any justifications for removing those waits besides "I
couldn't break it"? The logic for the concurrent indexing is pretty
intricate and we've got it wrong a couple of times without noticing bugs
for a long while, so I am really uncomfortable with statements like this.
Note that the waits on relation locks are not removed, only the wait
phases involving old snapshots.
During swap phase, process was waiting for transactions with older
snapshots than the one taken by transaction doing the swap as they
might hold the old index information. I think that we can get rid of
it thanks to the MVCC snapshots as other backends are now able to see
what is the correct index information to fetch.
After doing the new index validation, index has all the tuples
necessary, however it might not have taken into account tuples that
have been deleted before the reference snapshot was taken. But, in the
case of REINDEX CONCURRENTLY the index validated is not marked as
valid as it is the case in CREATE INDEX CONCURRENTLY, the transaction
doing the validation is directly committed. This index is thought as
valid only after doing the swap phase, when relfilenodes are changed.
I am sure you will find some flaws in this reasoning though :). Of
course not having being able to break this code now with my picky
tests taking targeted breakpoints does not mean that this code will
not fail in a given scenario, just that I could not break it yet.
Note also that removing those wait phases has the advantage to remove
risks of deadlocks when an ANALYZE is run in parallel to REINDEX
CONCURRENTLY as it was the case in the previous versions of the patch
(reproducible when waiting for the old snapshots if a session takes
ShareUpdateExclusiveLock on the same relation in parallel).
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-08-28 13:58:08 +0900, Michael Paquier wrote:
On Tue, Aug 27, 2013 at 11:09 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-08-27 15:34:22 +0900, Michael Paquier wrote:
I have been working a little bit more on this patch for the next
commit fest. Compared to the previous version, I have removed the part
of the code where process running REINDEX CONCURRENTLY was waiting for
transactions holding a snapshot older than the snapshot xmin of
process running REINDEX CONCURRENTLY at the validation and swap phase.
At the validation phase, there was a risk that the newly-validated
index might not contain deleted tuples before the snapshot used for
validation was taken. I tried to break the code in this area by
playing with multiple sessions but couldn't. Feel free to try the code
and break it if you can!Hm. Do you have any justifications for removing those waits besides "I
couldn't break it"? The logic for the concurrent indexing is pretty
intricate and we've got it wrong a couple of times without noticing bugs
for a long while, so I am really uncomfortable with statements like this.Note that the waits on relation locks are not removed, only the wait
phases involving old snapshots.During swap phase, process was waiting for transactions with older
snapshots than the one taken by transaction doing the swap as they
might hold the old index information. I think that we can get rid of
it thanks to the MVCC snapshots as other backends are now able to see
what is the correct index information to fetch.
I don't see MVCC snapshots guaranteeing that. The only thing changed due
to them is that other backends see a self consistent picture of the
catalog (i.e. not either, neither or both versions of a tuple as
earlier). It's still can be out of date. And we rely on those not being
out of date.
I need to look into the patch for more details.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 28, 2013 at 9:02 AM, Andres Freund <andres@2ndquadrant.com> wrote:
During swap phase, process was waiting for transactions with older
snapshots than the one taken by transaction doing the swap as they
might hold the old index information. I think that we can get rid of
it thanks to the MVCC snapshots as other backends are now able to see
what is the correct index information to fetch.I don't see MVCC snapshots guaranteeing that. The only thing changed due
to them is that other backends see a self consistent picture of the
catalog (i.e. not either, neither or both versions of a tuple as
earlier). It's still can be out of date. And we rely on those not being
out of date.I need to look into the patch for more details.
I agree with Andres. The only way in which the MVCC catalog snapshot
patch helps is that you can now do a transactional update on a system
catalog table without fearing that other backends will see the row as
nonexistent or duplicated. They will see exactly one version of the
row, just as you would naturally expect. However, a backend's
syscaches can still contain old versions of rows, and they can still
cache older versions of some tuples and newer versions of other
tuples. Those caches only get reloaded when shared-invalidation
messages are processed, and that only happens when the backend
acquires a lock on a new relation.
I have been of the opinion for some time now that the
shared-invalidation code is not a particularly good design for much of
what we need. Waiting for an old snapshot is often a proxy for
waiting long enough that we can be sure every other backend will
process the shared-invalidation message before it next uses any of the
cached data that will be invalidated by that message. However, it
would be better to be able to send invalidation messages in some way
that causes them to processed more eagerly by other backends, and that
provides some more specific feedback on whether or not they have
actually been processed. Then we could send the invalidation
messages, wait just until everyone confirms that they have been seen,
which should hopefully happen quickly, and then proceed. This would
probably lead to much shorter waits. Or maybe we should have
individual backends process invalidations more frequently, and try to
set things up so that once an invalidation is sent, the sending
backend is immediately guaranteed that it will be processed soon
enough, and thus it doesn't need to wait at all. This is all pie in
the sky, though. I don't have a clear idea how to design something
that's an improvement over the (rather intricate) system we have
today.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-08-29 10:39:09 -0400, Robert Haas wrote:
I have been of the opinion for some time now that the
shared-invalidation code is not a particularly good design for much of
what we need. Waiting for an old snapshot is often a proxy for
waiting long enough that we can be sure every other backend will
process the shared-invalidation message before it next uses any of the
cached data that will be invalidated by that message. However, it
would be better to be able to send invalidation messages in some way
that causes them to processed more eagerly by other backends, and that
provides some more specific feedback on whether or not they have
actually been processed. Then we could send the invalidation
messages, wait just until everyone confirms that they have been seen,
which should hopefully happen quickly, and then proceed.
Actually, the shared inval code already has that knowledge, doesn't it?
ISTM all we'd need is have a "sequence number" of SI entries which has
to be queryable. Then one can simply wait till all backends have
consumed up to that id which we keep track of the furthest back backend
in shmem.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Looking at this version of the patch now:
1) comment for "Phase 4 of REINDEX CONCURRENTLY" ends with an incomplete
sentence.
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.
3) I am not sure if the swap algorithm used now actually is correct
either. We have mvcc snapshots now, right, but we're still potentially
taking separate snapshot for individual relcache lookups. What's
stopping us from temporarily ending up with two relcache entries with
the same relfilenode?
Previously you swapped names - I think that might end up being easier,
because having names temporarily confused isn't as bad as two indexes
manipulating the same file.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 16, 2013 at 10:38 AM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-08-29 10:39:09 -0400, Robert Haas wrote:
I have been of the opinion for some time now that the
shared-invalidation code is not a particularly good design for much of
what we need. Waiting for an old snapshot is often a proxy for
waiting long enough that we can be sure every other backend will
process the shared-invalidation message before it next uses any of the
cached data that will be invalidated by that message. However, it
would be better to be able to send invalidation messages in some way
that causes them to processed more eagerly by other backends, and that
provides some more specific feedback on whether or not they have
actually been processed. Then we could send the invalidation
messages, wait just until everyone confirms that they have been seen,
which should hopefully happen quickly, and then proceed.Actually, the shared inval code already has that knowledge, doesn't it?
ISTM all we'd need is have a "sequence number" of SI entries which has
to be queryable. Then one can simply wait till all backends have
consumed up to that id which we keep track of the furthest back backend
in shmem.
In theory, yes, but in practice, there are a few difficulties.
1. We're not in a huge hurry to ensure that sinval notifications are
delivered in a timely fashion. We know that sinval resets are bad, so
if a backend is getting close to needing a sinval reset, we kick it in
an attempt to get it to AcceptInvalidationMessages(). But if the
sinval queue isn't filling up, there's no upper bound on the amount of
time that can pass before a particular sinval is read. Therefore, the
amount of time that passes before an idle backend is forced to drain
the sinval queue can vary widely, from a fraction of a second to
minutes, hours, or days. So it's kind of unappealing to think about
making user-visible behavior dependent on how long it ends up taking.
2. Every time we add a new kind of sinval message, we increase the
frequency of sinval resets, and those are bad. So any notifications
that we choose to send this way had better be pretty low-volume.
Considering the foregoing points, it's unclear to me whether we should
try to improve sinval incrementally or replace it with something
completely new. I'm sure that the above-mentioned problems are
solvable, but I'm not sure how hairy it will be. On the other hand,
designing something new could be pretty hairy, too.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-17 16:34:37 -0400, Robert Haas wrote:
On Mon, Sep 16, 2013 at 10:38 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Actually, the shared inval code already has that knowledge, doesn't it?
ISTM all we'd need is have a "sequence number" of SI entries which has
to be queryable. Then one can simply wait till all backends have
consumed up to that id which we keep track of the furthest back backend
in shmem.In theory, yes, but in practice, there are a few difficulties.
Agreed ;)
1. We're not in a huge hurry to ensure that sinval notifications are
delivered in a timely fashion. We know that sinval resets are bad, so
if a backend is getting close to needing a sinval reset, we kick it in
an attempt to get it to AcceptInvalidationMessages(). But if the
sinval queue isn't filling up, there's no upper bound on the amount of
time that can pass before a particular sinval is read. Therefore, the
amount of time that passes before an idle backend is forced to drain
the sinval queue can vary widely, from a fraction of a second to
minutes, hours, or days. So it's kind of unappealing to think about
making user-visible behavior dependent on how long it ends up taking.
Well, when we're signalling it's certainly faster than waiting for the
other's snapshot to vanish which can take ages for normal backends. And
we can signal when we wait for consumption without too many
problems.
Also, I think in most of the usecases we can simply not wait for any of
the idle backends, those don't use the old definition anyway.
2. Every time we add a new kind of sinval message, we increase the
frequency of sinval resets, and those are bad. So any notifications
that we choose to send this way had better be pretty low-volume.
In pretty much all the cases where I can see the need for something like
that, we already send sinval messages, so we should be able to
piggbyback on those.
Considering the foregoing points, it's unclear to me whether we should
try to improve sinval incrementally or replace it with something
completely new. I'm sure that the above-mentioned problems are
solvable, but I'm not sure how hairy it will be. On the other hand,
designing something new could be pretty hairy, too.
I am pretty sure there's quite a bit to improve around sinvals but I
think any replacement would look surprisingly similar to what we
have. So I think doing it incrementally is more realistic.
And I am certainly scared by the thought of having to replace it without
breaking corner cases all over.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 17, 2013 at 7:04 PM, Andres Freund <andres@2ndquadrant.com> wrote:
1. We're not in a huge hurry to ensure that sinval notifications are
delivered in a timely fashion. We know that sinval resets are bad, so
if a backend is getting close to needing a sinval reset, we kick it in
an attempt to get it to AcceptInvalidationMessages(). But if the
sinval queue isn't filling up, there's no upper bound on the amount of
time that can pass before a particular sinval is read. Therefore, the
amount of time that passes before an idle backend is forced to drain
the sinval queue can vary widely, from a fraction of a second to
minutes, hours, or days. So it's kind of unappealing to think about
making user-visible behavior dependent on how long it ends up taking.Well, when we're signalling it's certainly faster than waiting for the
other's snapshot to vanish which can take ages for normal backends. And
we can signal when we wait for consumption without too many
problems.
Also, I think in most of the usecases we can simply not wait for any of
the idle backends, those don't use the old definition anyway.
Possibly. It would need some thought, though.
I am pretty sure there's quite a bit to improve around sinvals but I
think any replacement would look surprisingly similar to what we
have. So I think doing it incrementally is more realistic.
And I am certainly scared by the thought of having to replace it without
breaking corner cases all over.
I guess I was more thinking that we might want some parallel mechanism
with somewhat different semantics. But that might be a bad idea
anyway. On the flip side, if I had any clear idea how to adapt the
current mechanism to suck less, I would have done it already.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Hi,
Sorry for late reply, I am coming back poking at this patch a bit. One
of the things that I am still unhappy with this patch are the
potential deadlocks that can come up when for example another backend
kicks another operation taking ShareUpdateExclusiveLock (ANALYZE or
another REINDEX CONCURRENTLY) on the same relation as the one
reindexed concurrently. This can happen because we need to wait at
index validation phase as process might not have taken into account
deleted tuples before the reference snapshot was taken. I played a
little bit with a version of the code using no old snapshot waiting,
but even if I couldn't break it directly, concurrent backends
sometimes took incorrect tuples from heap. I unfortunately have no
clear solution about how to solve that... Except making REINDEX
CONCURRENTLY fail when validating the concurrent index with a clear
error message not referencing to any deadlock, giving priority to
other processes like for example ANALYZE, or other backends ready to
kick another REINDEX CONCURRENTLY... Any ideas here are welcome, the
patch attached does the implementation mentioned here.
On Tue, Sep 17, 2013 at 12:37 AM, Andres Freund <andres@2ndquadrant.com> wrote:
Looking at this version of the patch now:
1) comment for "Phase 4 of REINDEX CONCURRENTLY" ends with an incomplete
sentence.
Oops, thanks.
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.
Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
3) I am not sure if the swap algorithm used now actually is correct
either. We have mvcc snapshots now, right, but we're still potentially
taking separate snapshot for individual relcache lookups. What's
stopping us from temporarily ending up with two relcache entries with
the same relfilenode?
Previously you swapped names - I think that might end up being easier,
because having names temporarily confused isn't as bad as two indexes
manipulating the same file.
Actually, performing swap operation with names proves to be more
difficult than it looks as it makes necessary a moment where both the
old and new indexes are marked as valid for all the backends. The only
reason for that is that index_set_state_flag assumes that a given xact
has not yet done any transactional update when it is called, forcing
to one the number of state flag that can be changed inside a
transaction. This is a safe method IMO, and we shouldn't break that.
Also, as far as I understood, this is something that we *want* to
avoid to a REINDEX CONCURRENTLY process that might fail and come up
with a double number of valid indexes for a given relation if it is
performed for a table (or an index if reindex is done on an index).
This is also a requirement for toast indexes where new code assumes
that a toast relation can only have one single valid index at the same
time. For those reasons the relfilenode approach is better.
Regards,
--
Michael
Attachments:
20130926_0_procarray.patchapplication/octet-stream; name=20130926_0_procarray.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b73ee4f..6f44cb2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1323,7 +1323,6 @@ index_drop(Oid indexId, bool concurrent)
indexrelid;
LOCKTAG heaplocktag;
LOCKMODE lockmode;
- VirtualTransactionId *old_lockholders;
/*
* To drop an index safely, we must grab exclusive lock on its parent
@@ -1445,11 +1444,8 @@ index_drop(Oid indexId, bool concurrent)
/*
* Now we must wait until no running transaction could be using the
- * index for a query. To do this, inquire which xacts currently would
- * conflict with AccessExclusiveLock on the table -- ie, which ones
- * have a lock of any kind on the table. Then wait for each of these
- * xacts to commit or abort. Note we do not need to worry about xacts
- * that open the table for reading after this point; they will see the
+ * index for a query. Note we do not need to worry about xacts that
+ * open the table for reading after this point; they will see the
* index as invalid when they open the relation.
*
* Note: the reason we use actual lock acquisition here, rather than
@@ -1457,18 +1453,8 @@ index_drop(Oid indexId, bool concurrent)
* possible if one of the transactions in question is blocked trying
* to acquire an exclusive lock on our table. The lock code will
* detect deadlock and error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need
- * not check for that. Also, prepared xacts are not reported, which
- * is fine since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* No more predicate locks will be acquired on this index, and we're
@@ -1512,13 +1498,7 @@ index_drop(Oid indexId, bool concurrent)
* Wait till every transaction that saw the old index state has
* finished. The logic here is the same as above.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
/*
* Re-open relations to allow us to complete our actions.
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 902daa0..26884b1 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -321,13 +321,9 @@ DefineIndex(IndexStmt *stmt,
IndexInfo *indexInfo;
int numberOfAttributes;
TransactionId limitXmin;
- VirtualTransactionId *old_lockholders;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -652,10 +648,7 @@ DefineIndex(IndexStmt *stmt,
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
- * with the old list of indexes. To do this, inquire which xacts
- * currently would conflict with ShareLock on the table -- ie, which ones
- * have a lock that permits writing the table. Then wait for each of
- * these xacts to commit or abort. Note we do not need to worry about
+ * with the old list of indexes. Note we do not need to worry about
* xacts that open the table for writing after this point; they will see
* the new index when they open it.
*
@@ -664,18 +657,8 @@ DefineIndex(IndexStmt *stmt,
* one of the transactions in question is blocked trying to acquire an
* exclusive lock on our table. The lock code will detect deadlock and
* error out properly.
- *
- * Note: GetLockConflicts() never reports our own xid, hence we need not
- * check for that. Also, prepared xacts are not reported, which is fine
- * since they certainly aren't going to do anything more.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* At this moment we are sure that there are no transactions with the
@@ -739,13 +722,7 @@ DefineIndex(IndexStmt *stmt,
* We once again wait until no transaction can have the table open with
* the index marked as read-only for updates.
*/
- old_lockholders = GetLockConflicts(&heaplocktag, ShareLock);
-
- while (VirtualTransactionIdIsValid(*old_lockholders))
- {
- VirtualXactLock(*old_lockholders, true);
- old_lockholders++;
- }
+ WaitForVirtualLocks(heaplocktag, ShareLock);
/*
* Now take the "reference snapshot" that will be used by validate_index()
@@ -786,74 +763,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOldSnapshots(limitXmin);
/*
* Index can now be marked valid -- update its pg_index entry
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index c2f86ff..ac1f3ec 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2567,6 +2567,153 @@ XidCacheRemoveRunningXids(TransactionId xid,
LWLockRelease(ProcArrayLock);
}
+
+/*
+ * WaitForMultipleVirtualLocks
+ *
+ * Wait until no transactions hold the relation related to lock those locks.
+ * To do this, inquire which xacts currently would conflict with each lock on
+ * the table referred by the respective LOCKMODE -- ie, which ones have a lock
+ * that permits writing the relation. Then wait for each of these xacts to
+ * commit or abort.
+ *
+ * To do this, inquire which xacts currently would conflict with lockmode
+ * on the relation.
+ *
+ * Note: GetLockConflicts() never reports our own xid, hence we need not
+ * check for that. Also, prepared xacts are not reported, which is fine
+ * since they certainly aren't going to do anything more.
+ */
+void
+WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode)
+{
+ VirtualTransactionId **old_lockholders;
+ int i, count = 0;
+ ListCell *lc;
+
+ /* Leave if no locks to wait for */
+ if (list_length(locktags) == 0)
+ return;
+
+ old_lockholders = (VirtualTransactionId **)
+ palloc(list_length(locktags) * sizeof(VirtualTransactionId *));
+
+ /* Collect the transactions we need to wait on for each relation lock */
+ foreach(lc, locktags)
+ {
+ LOCKTAG *locktag = lfirst(lc);
+ old_lockholders[count++] = GetLockConflicts(locktag, lockmode);
+ }
+
+ /* Finally wait for each transaction to complete */
+ for (i = 0; i < count; i++)
+ {
+ VirtualTransactionId *lockholders = old_lockholders[i];
+
+ while (VirtualTransactionIdIsValid(*lockholders))
+ {
+ VirtualXactLock(*lockholders, true);
+ lockholders++;
+ }
+ }
+
+ /* Clean up */
+ pfree(old_lockholders);
+}
+
+
+/*
+ * WaitForVirtualLocks
+ *
+ * Similar to WaitForMultipleVirtualLocks, but for a single lock tag.
+ */
+void
+WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
+{
+ WaitForMultipleVirtualLocks(list_make1(&heaplocktag), lockmode);
+}
+
+
+/*
+ * WaitForOldSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given xmin
+ * limit, because it might not contain tuples deleted just before it has
+ * been taken. Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin given;
+ * their oldest snapshot must be newer than our xmin limit.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+void
+WaitForOldSnapshots(TransactionId limitXmin)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
#ifdef XIDCACHE_DEBUG
/*
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index c5f58b4..4df51b0 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -77,4 +77,8 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
int nxids, const TransactionId *xids,
TransactionId latestXid);
+extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
+extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
+extern void WaitForOldSnapshots(TransactionId limitXmin);
+
#endif /* PROCARRAY_H */
20130926_1_index_struct.patchapplication/octet-stream; name=20130926_1_index_struct.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 6f44cb2..aed57f0 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1091,6 +1091,100 @@ index_create(Relation heapRelation,
}
/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel, indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead.
+ */
+void
+index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
+{
+ Relation heapRelation, indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForVirtualLocks(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexOid, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
@@ -1442,50 +1536,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. Note we do not need to worry about xacts that
- * open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- */
- WaitForVirtualLocks(heaplocktag, AccessExclusiveLock);
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(heapId, indexId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 26884b1..29d7eea 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -311,7 +311,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -678,27 +677,13 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation, NoLock, false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..9f29003 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -62,6 +62,14 @@ extern Oid index_create(Relation heapRelation,
bool concurrent,
bool is_internal);
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_set_dead(Oid heapOid,
+ Oid indexOid,
+ LOCKTAG locktag);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
20130926_2_reindex_conc_v31.patchapplication/octet-stream; name=20130926_2_reindex_conc_v31.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index cefd323..2d7678b 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..5f42c4f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,22 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>,
+ including invalid toast indexes.
</para>
</listitem>
@@ -139,6 +152,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +259,115 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped. Finally two additional transactions
+ are used to mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Valid indexes, being unique
+ for a given toast relation, cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +399,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index aed57f0..461d685 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1090,6 +1100,190 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with existing names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, NoLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
/*
* index_concurrent_build
*
@@ -1128,6 +1322,65 @@ index_concurrent_build(Oid heapOid,
index_close(indexRelation, NoLock);
}
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. Relations do not require an exclusive lock thanks to the
+ * MVCC catalog access to relcache.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take a necessary lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, ShareUpdateExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, ShareUpdateExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swap happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
/*
* index_concurrent_set_dead
*
@@ -1185,6 +1438,71 @@ index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
}
/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 29d7eea..e4e90f7 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -449,7 +450,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -596,7 +598,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -750,7 +752,7 @@ DefineIndex(IndexStmt *stmt,
* before the reference snap was taken, we have to wait out any
* transactions that might have older snapshots.
*/
- WaitForOldSnapshots(limitXmin);
+ WaitForOldSnapshots(limitXmin, true);
/*
* Index can now be marked valid -- update its pg_index entry
@@ -777,6 +779,546 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ case RELKIND_TOASTVALUE:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation;
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* A shared relation cannot be reindexed concurrently */
+ if (IsSharedRelation(relationOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* A system catalog cannot be reindexed concurrently */
+ if (IsSystemNamespace(get_rel_namespace(relationOid)))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for catalog relations")));
+
+ /* Open relation to get its indexes */
+ heapRelation = heap_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForMultipleVirtualLocks(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ TransactionId limitXmin;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /*
+ * We can now do away with our active snapshot, we still need to save the xmin
+ * limit to wait for older snapshots.
+ */
+ limitXmin = snapshot->xmin;
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ if (!WaitForOldSnapshots(limitXmin, false))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent index validation failed while waiting for other virtual transactions"),
+ errhint("Check that no other concurrent operation is running in parallel")));
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes have been validated could be used,
+ * we need to swap each concurrent index with its corresponding old index.
+ * Note that the concurrent index used for swaping is not marked as valid
+ * because we need to keep the former index and the concurrent index with
+ * a different valid status to avoid an implosion in the number of indexes
+ * a parent relation could have if this operation fails multiple times in
+ * a row due to a reason or another. Note that we already know thanks to
+ * validation step that
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ relOid = IndexGetRelation(indOid, false);
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(relOid, indOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1439,7 +1981,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1465,6 +2008,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1577,18 +2127,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ concurrent, concurrent,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1657,13 +2211,27 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ concurrent, concurrent,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid,
REINDEX_REL_PROCESS_TOAST |
@@ -1684,7 +2252,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1696,6 +2267,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1779,17 +2359,42 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid,
- REINDEX_REL_PROCESS_TOAST |
- REINDEX_REL_CHECK_CONSTRAINTS))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid,
+ REINDEX_REL_PROCESS_TOAST |
+ REINDEX_REL_CHECK_CONSTRAINTS);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8839f98..400e7b4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -874,6 +874,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
char relkind;
Form_pg_class classform;
LOCKMODE heap_lockmode;
+ bool invalid_system_index = false;
state = (struct DropRelationCallbackState *) arg;
relkind = state->relkind;
@@ -909,7 +910,37 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
rel->relname);
- if (!allowSystemTableMods && IsSystemClass(classform))
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(relOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Mark object as being an invalid index of system catalogs */
+ if (!indisvalid)
+ invalid_system_index = true;
+ }
+
+ /* In the case of an invalid index, it is fine to bypass this check */
+ if (!invalid_system_index && !allowSystemTableMods && IsSystemClass(classform))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied: \"%s\" is a system catalog",
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..5495f22 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1201,6 +1201,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65f3b98..25324cd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3640,6 +3640,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 4c9b05e..d1e73fc 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1849,6 +1849,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a9812af..87e2a8b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6789,29 +6789,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index ac1f3ec..fbebbec 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2640,7 +2640,7 @@ WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
* Wait for transactions that might have older snapshot than the given xmin
* limit, because it might not contain tuples deleted just before it has
* been taken. Obtain a list of VXIDs of such transactions, and wait for them
- * individually.
+ * individually or return a status error to caller if no wait is done.
*
* We can exclude any running transactions that have xmin > the xmin given;
* their oldest snapshot must be newer than our xmin limit.
@@ -2667,8 +2667,8 @@ WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode)
* GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
* doesn't show up in the output, we know we can forget about it.
*/
-void
-WaitForOldSnapshots(TransactionId limitXmin)
+bool
+WaitForOldSnapshots(TransactionId limitXmin, bool wait)
{
int i, n_old_snapshots;
VirtualTransactionId *old_snapshots;
@@ -2679,6 +2679,8 @@ WaitForOldSnapshots(TransactionId limitXmin)
for (i = 0; i < n_old_snapshots; i++)
{
+ bool res = true;
+
if (!VirtualTransactionIdIsValid(old_snapshots[i]))
continue; /* found uninteresting in previous cycle */
@@ -2709,8 +2711,18 @@ WaitForOldSnapshots(TransactionId limitXmin)
}
if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
+ {
+ /*
+ * If this VXID is still running, and caller has chosen not to
+ * wait, simply return a false status.
+ */
+ if (!VirtualXactLock(old_snapshots[i], wait) && !wait)
+ return false;
+ }
}
+
+ /* No problem while waiting for other virtual XIDs */
+ return true;
}
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1023c4..1843386 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -778,16 +778,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -799,8 +803,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9f29003..ab45c67 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,16 +60,25 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
extern void index_concurrent_build(Oid heapOid,
Oid indexOid,
bool isprimary);
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
extern void index_concurrent_set_dead(Oid heapOid,
Oid indexOid,
LOCKTAG locktag);
+extern void index_concurrent_drop(Oid indexOid);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 836c99e..d78a63e 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -27,10 +27,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 51fef68..4cde473 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2587,6 +2587,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/include/storage/procarray.h b/src/include/storage/procarray.h
index 4df51b0..47cc286 100644
--- a/src/include/storage/procarray.h
+++ b/src/include/storage/procarray.h
@@ -79,6 +79,6 @@ extern void XidCacheRemoveRunningXids(TransactionId xid,
extern void WaitForMultipleVirtualLocks(List *locktags, LOCKMODE lockmode);
extern void WaitForVirtualLocks(LOCKTAG heaplocktag, LOCKMODE lockmode);
-extern void WaitForOldSnapshots(TransactionId limitXmin);
+extern bool WaitForOldSnapshots(TransactionId limitXmin, bool wait);
#endif /* PROCARRAY_H */
diff --git a/src/test/isolation/expected/reindex-concurrently.out b/src/test/isolation/expected/reindex-concurrently.out
new file mode 100644
index 0000000..9e04169
--- /dev/null
+++ b/src/test/isolation/expected/reindex-concurrently.out
@@ -0,0 +1,78 @@
+Parsed test spec with 3 sessions
+
+starting permutation: reindex sel1 upd2 ins2 del2 end1 end2
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab;
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+
+starting permutation: sel1 reindex upd2 ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 reindex ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 reindex del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 reindex end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 end1 reindex end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end2: COMMIT;
+step reindex: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 081e11f..fb4c1a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -20,4 +20,5 @@ test: delete-abort-savept
test: delete-abort-savept-2
test: aborted-keyrevoke
test: drop-index-concurrently-1
+test: reindex-concurrently
test: timeouts
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..eb59fe0
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE CONCURRENTLY reind_con_tab; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 81c64e5..a613227 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2741,3 +2741,60 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+ERROR: concurrent reindex is not supported for catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 4ee8581..aacfb58 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -915,3 +915,44 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.
Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.
That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as valid
2) now wait for all transactions that potentially still can use the index for
*querying* to finish. During that indisready *must* be true,
otherwise the index will have outdated contents.
3) Mark the index as indislive = false, indisready = false. Anything
using a newer relcache entry will now not update the index.
4) Wait till all potential updaters of the index have finished.
5) Drop the index.
With the patch's current scheme concurrent queries that use plans using
the old index will get wrong results (at least in read committed)
because concurrent writers will not update it anymore since it's marked
indisready = false.
This isn't a problem of the *new* index, it's a problem of the *old*
one.
Am I missing something?
3) I am not sure if the swap algorithm used now actually is correct
either. We have mvcc snapshots now, right, but we're still potentially
taking separate snapshot for individual relcache lookups. What's
stopping us from temporarily ending up with two relcache entries with
the same relfilenode?
Previously you swapped names - I think that might end up being easier,
because having names temporarily confused isn't as bad as two indexes
manipulating the same file.
Actually, performing swap operation with names proves to be more
difficult than it looks as it makes necessary a moment where both the
old and new indexes are marked as valid for all the backends.
But that doesn't make the current method correct, does it?
The only
reason for that is that index_set_state_flag assumes that a given xact
has not yet done any transactional update when it is called, forcing
to one the number of state flag that can be changed inside a
transaction. This is a safe method IMO, and we shouldn't break that.
Part of that reasoning comes from the non-mvcc snapshot days, so it's
not really up to date anymore.
Even if you don't want to go through all that logic - which I'd
understand quite well - you can just do it like:
1) start with: old index: valid, ready, live; new index: invalid, ready, live
2) one transaction: switch names from real_name => tmp_name, new_name =>
real_name
3) one transaction: mark real_name (which is the rebuilt index) as valid,
and new_name as invalid
Now, if we fail in the midst of 3, we'd have two indexes marked as
valid. But that's unavoidable as long as you don't want to use
transactions.
Alternatively you could pass in a flag to use transactional updates,
that should now be safe.
At least, unless the old index still has "indexcheckxmin = true" with an
xmin that's not old enough. But in that case we cannot do the concurrent
reindex at all I think since we rely on the old index to be full valid.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as valid
indisvalid is never set to true for the concurrent index. Swap is done
with concurrent index having indisvalid = false and former index with
indisvalid = true. The concurrent index is validated with
index_validate in a transaction before swap transaction.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as valid
indisvalid is never set to true for the concurrent index. Swap is done
with concurrent index having indisvalid = false and former index with
indisvalid = true. The concurrent index is validated with
index_validate in a transaction before swap transaction.
Yes. I've described how it *has* to be done, not how it's done.
The current method of going straight to isready = false for the original
index will result in wrong results because it's not updated anymore
while it's still being used.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as validindisvalid is never set to true for the concurrent index. Swap is done
with concurrent index having indisvalid = false and former index with
indisvalid = true. The concurrent index is validated with
index_validate in a transaction before swap transaction.Yes. I've described how it *has* to be done, not how it's done.
The current method of going straight to isready = false for the original
index will result in wrong results because it's not updated anymore
while it's still being used.
The index being dropped at the end of process is not the former index,
but the concurrent index. The index used after REINDEX CONCURRENTLY is
the old index but with the new relfilenode.
Am I lacking of caffeine? It looks so...
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-26 20:47:33 +0900, Michael Paquier wrote:
On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as validindisvalid is never set to true for the concurrent index. Swap is done
with concurrent index having indisvalid = false and former index with
indisvalid = true. The concurrent index is validated with
index_validate in a transaction before swap transaction.Yes. I've described how it *has* to be done, not how it's done.
The current method of going straight to isready = false for the original
index will result in wrong results because it's not updated anymore
while it's still being used.
The index being dropped at the end of process is not the former index,
but the concurrent index. The index used after REINDEX CONCURRENTLY is
the old index but with the new relfilenode.
That's not relevant unless I miss something.
After phase 4 both indexes are valid (although only the old one is
flagged as such), but due to the switching of the relfilenodes backends
could have either of both open, depending on the time they built the
relcache entry. Right?
Then you go ahead and mark the old index - which still might be used! -
as dead in phase 5. Which means other backends might (again, depending
on the time they have built the relcache entry) not update it
anymore. In read committed we very well might go ahead and use the index
with the same plan as before, but with a new snapshot. Which now will
miss entries.
Am I misunderstanding the algorithm you're using?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 26, 2013 at 8:56 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 20:47:33 +0900, Michael Paquier wrote:
On Thu, Sep 26, 2013 at 8:43 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 20:40:40 +0900, Michael Paquier wrote:
On Thu, Sep 26, 2013 at 7:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2013-09-26 12:13:30 +0900, Michael Paquier wrote:
2) I don't think the drop algorithm used now is correct. Your
index_concurrent_set_dead() sets both indisvalid = false and indislive =
false at the same time. It does so after doing a WaitForVirtualLocks() -
but that's not sufficient. Between waiting and setting indisvalid =
false another transaction could start which then would start using that
index. Which will not get updated anymore by other concurrent backends
because of inislive = false.
You really need to follow index_drop's lead here and first unset
indisvalid then wait till nobody can use the index for querying anymore
and only then unset indislive.Sorry, I do not follow you here. index_concurrent_set_dead calls
index_set_state_flags that sets indislive and *indisready* to false,
not indisvalid. The concurrent index never uses indisvalid = true so
it can never be called by another backend for a read query. The drop
algorithm is made to be consistent with DROP INDEX CONCURRENTLY btw.That makes it even worse... You can do the concurrent drop only in the
following steps:
1) set indisvalid = false, no future relcache lookups will have it as validindisvalid is never set to true for the concurrent index. Swap is done
with concurrent index having indisvalid = false and former index with
indisvalid = true. The concurrent index is validated with
index_validate in a transaction before swap transaction.Yes. I've described how it *has* to be done, not how it's done.
The current method of going straight to isready = false for the original
index will result in wrong results because it's not updated anymore
while it's still being used.The index being dropped at the end of process is not the former index,
but the concurrent index. The index used after REINDEX CONCURRENTLY is
the old index but with the new relfilenode.That's not relevant unless I miss something.
After phase 4 both indexes are valid (although only the old one is
flagged as such), but due to the switching of the relfilenodes backends
could have either of both open, depending on the time they built the
relcache entry. Right?
Then you go ahead and mark the old index - which still might be used! -
as dead in phase 5. Which means other backends might (again, depending
on the time they have built the relcache entry) not update it
anymore. In read committed we very well might go ahead and use the index
with the same plan as before, but with a new snapshot. Which now will
miss entries.
In this case, doing a call to WaitForOldSnapshots after the swap phase
is enough. It was included in past versions of the patch but removed
in the last 2 versions.
Btw, taking the problem from another viewpoint... This feature has now
3 patches, the 2 first patches doing only code refactoring. Could it
be possible to have a look at those ones first? Straight-forward
things should go first, simplifying the core feature evaluation.
Regards,
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2013-09-27 05:41:26 +0900, Michael Paquier wrote:
In this case, doing a call to WaitForOldSnapshots after the swap phase
is enough. It was included in past versions of the patch but removed
in the last 2 versions.
I don't think it is. I really, really suggest following the protocol
used by index_drop down to the t and document every *slight* deviation
carefully.
We've had more than one bug in index_drop's concurrent feature.
Btw, taking the problem from another viewpoint... This feature has now
3 patches, the 2 first patches doing only code refactoring. Could it
be possible to have a look at those ones first? Straight-forward
things should go first, simplifying the core feature evaluation.
I haven't looked at them in detail, but they looked good on a quick
pass. I'll make another pass, but that won't be before, say, Tuesday.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Michael Paquier escribi�:
Btw, taking the problem from another viewpoint... This feature has now
3 patches, the 2 first patches doing only code refactoring. Could it
be possible to have a look at those ones first? Straight-forward
things should go first, simplifying the core feature evaluation.
I have pushed the first half of the first patch for now, revising it
somewhat: I renamed the functions and put them in lmgr.c instead of
procarray.c.
I think the second half of that first patch (WaitForOldSnapshots) should
be in index.c, not procarray.c either. I didn't look at the actual code
in there.
I already shipped Michael fixed versions of the remaining patches
adjusting them to the changed API. I expect him to post them here.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Oct 2, 2013 at 6:06 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I have pushed the first half of the first patch for now, revising it
somewhat: I renamed the functions and put them in lmgr.c instead of
procarray.c.
Great thanks.
I think the second half of that first patch (WaitForOldSnapshots) should
be in index.c, not procarray.c either. I didn't look at the actual code
in there.
That's indexcmds.c in this case, not index.c.
I already shipped Michael fixed versions of the remaining patches
adjusting them to the changed API. I expect him to post them here.
And here they are attached, with the following changes:
- in 0002, WaitForOldSnapshots is renamed to WaitForOlderSnapshots.
This sounds better...
- in 0003, it looks that there was an error for the obtention of the
parent table Oid when calling index_concurrent_heap. I believe that
the lock that needs to be taken for RangeVarGetRelid is not NoLock but
ShareUpdateExclusiveLock. So changed it this way. I also added some
more comments at the top of each function for clarity.
- in 0004, patch is updated to reflect the API changes done in 0002 and 0003.
Each patch applied with its parents compiles, has no warnings AFAIK
and passes regression/isolation tests. Working on 0004 by the end of
the CF seems out of the way IMO, so I'd suggest focusing on 0002 and
0003 now, and I can put some time to finalize them for this CF. I
think that we should perhaps split 0003 into 2 pieces, with one patch
for the introduction of index_concurrent_build, and another for
index_concurrent_set_dead. Comments are welcome about that though, and
if people agree on that I'll do it once 0002 is finalized.
Regards,
--
Michael
Attachments:
20131002_0002_WaitForOlderSnapshots.patchapplication/octet-stream; name=20131002_0002_WaitForOlderSnapshots.patchDownload
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2155252..fe72613 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -277,6 +277,86 @@ CheckIndexCompatible(Oid oldId,
}
/*
+ * WaitForOlderSnapshots
+ *
+ * Wait for transactions that might have older snapshot than the given xmin
+ * limit, because it might not contain tuples deleted just before it has
+ * been taken. Obtain a list of VXIDs of such transactions, and wait for them
+ * individually.
+ *
+ * We can exclude any running transactions that have xmin > the xmin given;
+ * their oldest snapshot must be newer than our xmin limit.
+ * We can also exclude any transactions that have xmin = zero, since they
+ * evidently have no live snapshot at all (and any one they might be in
+ * process of taking is certainly newer than ours). Transactions in other
+ * DBs can be ignored too, since they'll never even be able to see this
+ * index.
+ *
+ * We can also exclude autovacuum processes and processes running manual
+ * lazy VACUUMs, because they won't be fazed by missing index entries
+ * either. (Manual ANALYZEs, however, can't be excluded because they
+ * might be within transactions that are going to do arbitrary operations
+ * later.)
+ *
+ * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
+ * check for that.
+ *
+ * If a process goes idle-in-transaction with xmin zero, we do not need to
+ * wait for it anymore, per the above argument. We do not have the
+ * infrastructure right now to stop waiting if that happens, but we can at
+ * least avoid the folly of waiting when it is idle at the time we would
+ * begin to wait. We do this by repeatedly rechecking the output of
+ * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
+ * doesn't show up in the output, we know we can forget about it.
+ */
+static void
+WaitForOlderSnapshots(TransactionId limitXmin)
+{
+ int i, n_old_snapshots;
+ VirtualTransactionId *old_snapshots;
+
+ old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_old_snapshots);
+
+ for (i = 0; i < n_old_snapshots; i++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[i]))
+ continue; /* found uninteresting in previous cycle */
+
+ if (i > 0)
+ {
+ /* see if anything's changed ... */
+ VirtualTransactionId *newer_snapshots;
+ int n_newer_snapshots, j, k;
+
+ newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
+ true, false,
+ PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
+ &n_newer_snapshots);
+ for (j = i; j < n_old_snapshots; j++)
+ {
+ if (!VirtualTransactionIdIsValid(old_snapshots[j]))
+ continue; /* found uninteresting in previous cycle */
+ for (k = 0; k < n_newer_snapshots; k++)
+ {
+ if (VirtualTransactionIdEquals(old_snapshots[j],
+ newer_snapshots[k]))
+ break;
+ }
+ if (k >= n_newer_snapshots) /* not there anymore */
+ SetInvalidVirtualTransactionId(old_snapshots[j]);
+ }
+ pfree(newer_snapshots);
+ }
+
+ if (VirtualTransactionIdIsValid(old_snapshots[i]))
+ VirtualXactLock(old_snapshots[i], true);
+ }
+}
+
+
+/*
* DefineIndex
* Creates a new index.
*
@@ -321,12 +401,9 @@ DefineIndex(IndexStmt *stmt,
IndexInfo *indexInfo;
int numberOfAttributes;
TransactionId limitXmin;
- VirtualTransactionId *old_snapshots;
- int n_old_snapshots;
LockRelId heaprelid;
LOCKTAG heaplocktag;
Snapshot snapshot;
- int i;
/*
* count attributes in index
@@ -766,74 +843,9 @@ DefineIndex(IndexStmt *stmt,
* The index is now valid in the sense that it contains all currently
* interesting tuples. But since it might not contain tuples deleted just
* before the reference snap was taken, we have to wait out any
- * transactions that might have older snapshots. Obtain a list of VXIDs
- * of such transactions, and wait for them individually.
- *
- * We can exclude any running transactions that have xmin > the xmin of
- * our reference snapshot; their oldest snapshot must be newer than ours.
- * We can also exclude any transactions that have xmin = zero, since they
- * evidently have no live snapshot at all (and any one they might be in
- * process of taking is certainly newer than ours). Transactions in other
- * DBs can be ignored too, since they'll never even be able to see this
- * index.
- *
- * We can also exclude autovacuum processes and processes running manual
- * lazy VACUUMs, because they won't be fazed by missing index entries
- * either. (Manual ANALYZEs, however, can't be excluded because they
- * might be within transactions that are going to do arbitrary operations
- * later.)
- *
- * Also, GetCurrentVirtualXIDs never reports our own vxid, so we need not
- * check for that.
- *
- * If a process goes idle-in-transaction with xmin zero, we do not need to
- * wait for it anymore, per the above argument. We do not have the
- * infrastructure right now to stop waiting if that happens, but we can at
- * least avoid the folly of waiting when it is idle at the time we would
- * begin to wait. We do this by repeatedly rechecking the output of
- * GetCurrentVirtualXIDs. If, during any iteration, a particular vxid
- * doesn't show up in the output, we know we can forget about it.
+ * transactions that might have older snapshots.
*/
- old_snapshots = GetCurrentVirtualXIDs(limitXmin, true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_old_snapshots);
-
- for (i = 0; i < n_old_snapshots; i++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[i]))
- continue; /* found uninteresting in previous cycle */
-
- if (i > 0)
- {
- /* see if anything's changed ... */
- VirtualTransactionId *newer_snapshots;
- int n_newer_snapshots;
- int j;
- int k;
-
- newer_snapshots = GetCurrentVirtualXIDs(limitXmin,
- true, false,
- PROC_IS_AUTOVACUUM | PROC_IN_VACUUM,
- &n_newer_snapshots);
- for (j = i; j < n_old_snapshots; j++)
- {
- if (!VirtualTransactionIdIsValid(old_snapshots[j]))
- continue; /* found uninteresting in previous cycle */
- for (k = 0; k < n_newer_snapshots; k++)
- {
- if (VirtualTransactionIdEquals(old_snapshots[j],
- newer_snapshots[k]))
- break;
- }
- if (k >= n_newer_snapshots) /* not there anymore */
- SetInvalidVirtualTransactionId(old_snapshots[j]);
- }
- pfree(newer_snapshots);
- }
-
- if (VirtualTransactionIdIsValid(old_snapshots[i]))
- VirtualXactLock(old_snapshots[i], true);
- }
+ WaitForOlderSnapshots(limitXmin);
/*
* Index can now be marked valid -- update its pg_index entry
20131002_0003_reindex_refactoring.patchapplication/octet-stream; name=20131002_0003_reindex_refactoring.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 826e504..7fb130b 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1091,6 +1091,102 @@ index_create(Relation heapRelation,
}
/*
+ * index_concurrent_build
+ *
+ * Build index for a concurrent operation. Low-level locks are taken when this
+ * operation is performed to prevent only schema changes but they need to be
+ * kept until the end of the transaction performing this operation.
+ */
+void
+index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary)
+{
+ Relation rel, indexRelation;
+ IndexInfo *indexInfo;
+
+ /* Open and lock the parent heap relation */
+ rel = heap_open(heapOid, ShareUpdateExclusiveLock);
+
+ /* And the target index relation */
+ indexRelation = index_open(indexOid, RowExclusiveLock);
+
+ /*
+ * We have to re-build the IndexInfo struct, since it was lost in
+ * commit of transaction where this concurrent index was created
+ * at the catalog level.
+ */
+ indexInfo = BuildIndexInfo(indexRelation);
+ Assert(!indexInfo->ii_ReadyForInserts);
+ indexInfo->ii_Concurrent = true;
+ indexInfo->ii_BrokenHotChain = false;
+
+ /* Now build the index */
+ index_build(rel, indexRelation, indexInfo, isprimary, false);
+
+ /* Close both the relations, but keep the locks */
+ heap_close(rel, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
+ * index_concurrent_set_dead
+ *
+ * Perform the last invalidation stage of DROP INDEX CONCURRENTLY before
+ * actually dropping the index. After calling this function the index is
+ * seen by all the backends as dead. Low-level locks taken during here
+ * are kept until the end of the transaction doing calling this function.
+ */
+void
+index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
+{
+ Relation heapRelation, indexRelation;
+
+ /*
+ * Now we must wait until no running transaction could be using the
+ * index for a query if necessary.
+ *
+ * Note: the reason we use actual lock acquisition here, rather than
+ * just checking the ProcArray and sleeping, is that deadlock is
+ * possible if one of the transactions in question is blocked trying
+ * to acquire an exclusive lock on our table. The lock code will
+ * detect deadlock and error out properly.
+ */
+ WaitForLockers(locktag, AccessExclusiveLock);
+
+ /*
+ * No more predicate locks will be acquired on this index, and we're
+ * about to stop doing inserts into the index which could show
+ * conflicts with existing predicate locks, so now is the time to move
+ * them to the heap relation.
+ */
+ heapRelation = heap_open(heapOid, ShareUpdateExclusiveLock);
+ indexRelation = index_open(indexOid, ShareUpdateExclusiveLock);
+ TransferPredicateLocksToHeapRelation(indexRelation);
+
+ /*
+ * Now we are sure that nobody uses the index for queries; they just
+ * might have it open for updating it. So now we can unset indisready
+ * and indislive, then wait till nobody could be using it at all
+ * anymore.
+ */
+ index_set_state_flags(indexOid, INDEX_DROP_SET_DEAD);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh the table's index list. Forgetting just
+ * the index's relcache entry is not enough.
+ */
+ CacheInvalidateRelcache(heapRelation);
+
+ /*
+ * Close the relations again, though still holding session lock.
+ */
+ heap_close(heapRelation, NoLock);
+ index_close(indexRelation, NoLock);
+}
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
@@ -1442,50 +1538,8 @@ index_drop(Oid indexId, bool concurrent)
CommitTransactionCommand();
StartTransactionCommand();
- /*
- * Now we must wait until no running transaction could be using the
- * index for a query. Note we do not need to worry about xacts that
- * open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
- *
- * Note: the reason we use actual lock acquisition here, rather than
- * just checking the ProcArray and sleeping, is that deadlock is
- * possible if one of the transactions in question is blocked trying
- * to acquire an exclusive lock on our table. The lock code will
- * detect deadlock and error out properly.
- */
- WaitForLockers(heaplocktag, AccessExclusiveLock);
-
- /*
- * No more predicate locks will be acquired on this index, and we're
- * about to stop doing inserts into the index which could show
- * conflicts with existing predicate locks, so now is the time to move
- * them to the heap relation.
- */
- userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
- userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
- TransferPredicateLocksToHeapRelation(userIndexRelation);
-
- /*
- * Now we are sure that nobody uses the index for queries; they just
- * might have it open for updating it. So now we can unset indisready
- * and indislive, then wait till nobody could be using it at all
- * anymore.
- */
- index_set_state_flags(indexId, INDEX_DROP_SET_DEAD);
-
- /*
- * Invalidate the relcache for the table, so that after this commit
- * all sessions will refresh the table's index list. Forgetting just
- * the index's relcache entry is not enough.
- */
- CacheInvalidateRelcache(userHeapRelation);
-
- /*
- * Close the relations again, though still holding session lock.
- */
- heap_close(userHeapRelation, NoLock);
- index_close(userIndexRelation, NoLock);
+ /* Finish invalidation of index and mark it as dead */
+ index_concurrent_set_dead(heapId, indexId, heaplocktag);
/*
* Again, commit the transaction to make the pg_index update visible
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index fe72613..3067639 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -391,7 +391,6 @@ DefineIndex(IndexStmt *stmt,
Oid tablespaceId;
List *indexColNames;
Relation rel;
- Relation indexRelation;
HeapTuple tuple;
Form_pg_am accessMethodForm;
bool amcanorder;
@@ -758,27 +757,15 @@ DefineIndex(IndexStmt *stmt,
* HOT-chain or the extension of the chain is HOT-safe for this index.
*/
- /* Open and lock the parent heap relation */
- rel = heap_openrv(stmt->relation, ShareUpdateExclusiveLock);
-
- /* And the target index relation */
- indexRelation = index_open(indexRelationId, RowExclusiveLock);
-
/* Set ActiveSnapshot since functions in the indexes may need it */
PushActiveSnapshot(GetTransactionSnapshot());
- /* We have to re-build the IndexInfo struct, since it was lost in commit */
- indexInfo = BuildIndexInfo(indexRelation);
- Assert(!indexInfo->ii_ReadyForInserts);
- indexInfo->ii_Concurrent = true;
- indexInfo->ii_BrokenHotChain = false;
-
- /* Now build the index */
- index_build(rel, indexRelation, indexInfo, stmt->primary, false);
-
- /* Close both the relations, but keep the locks */
- heap_close(rel, NoLock);
- index_close(indexRelation, NoLock);
+ /* Perform concurrent build of index */
+ index_concurrent_build(RangeVarGetRelid(stmt->relation,
+ ShareUpdateExclusiveLock,
+ false),
+ indexRelationId,
+ stmt->primary);
/*
* Update the pg_index row to mark the index as ready for inserts. Once we
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e697275..9f29003 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -62,6 +62,14 @@ extern Oid index_create(Relation heapRelation,
bool concurrent,
bool is_internal);
+extern void index_concurrent_build(Oid heapOid,
+ Oid indexOid,
+ bool isprimary);
+
+extern void index_concurrent_set_dead(Oid heapOid,
+ Oid indexOid,
+ LOCKTAG locktag);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
20131002_0004_reindex_conc_core.patchapplication/octet-stream; name=20131002_0004_reindex_conc_core.patchDownload
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index cefd323..2d7678b 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -863,8 +863,9 @@ ERROR: could not serialize access due to read/write dependencies among transact
<para>
Acquired by <command>VACUUM</command> (without <option>FULL</option>),
- <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>, and
- some forms of <command>ALTER TABLE</command>.
+ <command>ANALYZE</>, <command>CREATE INDEX CONCURRENTLY</>,
+ <command>REINDEX CONCURRENTLY</> and some forms of
+ <command>ALTER TABLE</command>.
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 7222665..5f42c4f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
+REINDEX { INDEX | TABLE | DATABASE | SYSTEM } [ CONCURRENTLY ] <replaceable class="PARAMETER">name</replaceable> [ FORCE ]
</synopsis>
</refsynopsisdiv>
@@ -68,9 +68,22 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
An index build with the <literal>CONCURRENTLY</> option failed, leaving
an <quote>invalid</> index. Such indexes are useless but it can be
convenient to use <command>REINDEX</> to rebuild them. Note that
- <command>REINDEX</> will not perform a concurrent build. To build the
- index without interfering with production you should drop the index and
- reissue the <command>CREATE INDEX CONCURRENTLY</> command.
+ <command>REINDEX</> will perform a concurrent build if <literal>
+ CONCURRENTLY</> is specified. To build the index without interfering
+ with production you should drop the index and reissue either the
+ <command>CREATE INDEX CONCURRENTLY</> or <command>REINDEX CONCURRENTLY</>
+ command. Indexes of toast relations can be rebuilt with <command>REINDEX
+ CONCURRENTLY</>.
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>
+ Concurrent indexes based on a <literal>PRIMARY KEY</> or an <literal>
+ EXCLUDE</> constraint need to be dropped with <literal>ALTER TABLE
+ DROP CONSTRAINT</>. This is also the case of <literal>UNIQUE</> indexes
+ using constraints. Other indexes can be dropped using <literal>DROP INDEX</>,
+ including invalid toast indexes.
</para>
</listitem>
@@ -139,6 +152,21 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
</varlistentry>
<varlistentry>
+ <term><literal>CONCURRENTLY</literal></term>
+ <listitem>
+ <para>
+ When this option is used, <productname>PostgreSQL</> will rebuild the
+ index without taking any locks that prevent concurrent inserts,
+ updates, or deletes on the table; whereas a standard reindex build
+ locks out writes (but not reads) on the table until it's done.
+ There are several caveats to be aware of when using this option
+ — see <xref linkend="SQL-REINDEX-CONCURRENTLY"
+ endterm="SQL-REINDEX-CONCURRENTLY-title">.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>FORCE</literal></term>
<listitem>
<para>
@@ -231,6 +259,115 @@ REINDEX { INDEX | TABLE | DATABASE | SYSTEM } <replaceable class="PARAMETER">nam
to be reindexed by separate commands. This is still possible, but
redundant.
</para>
+
+
+ <refsect2 id="SQL-REINDEX-CONCURRENTLY">
+ <title id="SQL-REINDEX-CONCURRENTLY-title">Rebuilding Indexes Concurrently</title>
+
+ <indexterm zone="SQL-REINDEX-CONCURRENTLY">
+ <primary>index</primary>
+ <secondary>rebuilding concurrently</secondary>
+ </indexterm>
+
+ <para>
+ Rebuilding an index can interfere with regular operation of a database.
+ Normally <productname>PostgreSQL</> locks the table whose index is rebuilt
+ against writes and performs the entire index build with a single scan of the
+ table. Other transactions can still read the table, but if they try to
+ insert, update, or delete rows in the table they will block until the
+ index rebuild is finished. This could have a severe effect if the system is
+ a live production database. Very large tables can take many hours to be
+ indexed, and even for smaller tables, an index rebuild can lock out writers
+ for periods that are unacceptably long for a production system.
+ </para>
+
+ <para>
+ <productname>PostgreSQL</> supports rebuilding indexes without locking
+ out writes. This method is invoked by specifying the
+ <literal>CONCURRENTLY</> option of <command>REINDEX</>.
+ When this option is used, <productname>PostgreSQL</> must perform two
+ scans of the table for each index that needs to be rebuild and in
+ addition it must wait for all existing transactions that could potentially
+ use the index to terminate. This method requires more total work than a
+ standard index rebuild and takes significantly longer to complete as it
+ needs to wait for unfinished transactions that might modify the index.
+ However, since it allows normal operations to continue while the index
+ is rebuilt, this method is useful for rebuilding indexes in a production
+ environment. Of course, the extra CPU, memory and I/O load imposed by
+ the index rebuild might slow other operations.
+ </para>
+
+ <para>
+ In a concurrent index build, a new index whose storage will replace the one
+ to be rebuild is actually entered into the system catalogs in one transaction,
+ then two table scans occur in two more transactions. Once this is performed,
+ the old and fresh indexes are swapped. Finally two additional transactions
+ are used to mark the concurrent index as not ready and then drop it.
+ </para>
+
+ <para>
+ If a problem arises while rebuilding the indexes, such as a
+ uniqueness violation in a unique index, the <command>REINDEX</>
+ command will fail but leave behind an <quote>invalid</> new index on top
+ of the existing one. This index will be ignored for querying purposes
+ because it might be incomplete; however it will still consume update
+ overhead. The <application>psql</> <command>\d</> command will report
+ such an index as <literal>INVALID</>:
+
+<programlisting>
+postgres=# \d tab
+ Table "public.tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ col | integer |
+Indexes:
+ "idx" btree (col)
+ "idx_cct" btree (col) INVALID
+</programlisting>
+
+ The recommended recovery method in such cases is to drop the concurrent
+ index and try again to perform <command>REINDEX CONCURRENTLY</>.
+ The concurrent index created during the processing has a name finishing by
+ the suffix cct. This works as well with indexes of toast relations.
+ </para>
+
+ <para>
+ Regular index builds permit other regular index builds on the
+ same table to occur in parallel, but only one concurrent index build
+ can occur on a table at a time. In both cases, no other types of schema
+ modification on the table are allowed meanwhile. Another difference
+ is that a regular <command>REINDEX TABLE</> or <command>REINDEX INDEX</>
+ command can be performed within a transaction block, but
+ <command>REINDEX CONCURRENTLY</> cannot. <command>REINDEX DATABASE</> is
+ by default not allowed to run inside a transaction block, so in this case
+ <command>CONCURRENTLY</> is not supported.
+ </para>
+
+ <para>
+ Invalid indexes of toast relations can be dropped if a failure occurred
+ during <command>REINDEX CONCURRENTLY</>. Valid indexes, being unique
+ for a given toast relation, cannot be dropped.
+ </para>
+
+ <para>
+ <command>REINDEX DATABASE</command> used with <command>CONCURRENTLY
+ </command> rebuilds concurrently only the non-system relations. System
+ relations are rebuilt with a non-concurrent context. Toast indexes are
+ rebuilt concurrently if the relation they depend on is a non-system
+ relation.
+ </para>
+
+ <para>
+ <command>REINDEX</command> uses <literal>ACCESS EXCLUSIVE</literal> lock
+ on all the relations involved during operation. When <command>CONCURRENTLY</command>
+ is specified, the operation is done with <literal>SHARE UPDATE EXCLUSIVE</literal>.
+ </para>
+
+ <para>
+ <command>REINDEX SYSTEM</command> does not support <command>CONCURRENTLY
+ </command>.
+ </para>
+ </refsect2>
</refsect1>
<refsect1>
@@ -262,7 +399,18 @@ $ <userinput>psql broken_db</userinput>
...
broken_db=> REINDEX DATABASE broken_db;
broken_db=> \q
-</programlisting></para>
+</programlisting>
+ </para>
+
+ <para>
+ Rebuild a table while authorizing read and write operations on involved
+ relations when performed:
+
+<programlisting>
+REINDEX TABLE CONCURRENTLY my_broken_table;
+</programlisting>
+ </para>
+
</refsect1>
<refsect1>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 7fb130b..4635be8 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -43,9 +43,11 @@
#include "catalog/pg_trigger.h"
#include "catalog/pg_type.h"
#include "catalog/storage.h"
+#include "commands/defrem.h"
#include "commands/tablecmds.h"
#include "commands/trigger.h"
#include "executor/executor.h"
+#include "mb/pg_wchar.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
@@ -672,6 +674,10 @@ UpdateIndexRelation(Oid indexoid,
* will be marked "invalid" and the caller must take additional steps
* to fix it up.
* is_internal: if true, post creation hook for new index
+ * is_reindex: if true, create an index that is used as a duplicate of an
+ * existing index created during a concurrent operation. This index can
+ * also be a toast relation. Sufficient locks are normally taken on
+ * the related relations once this is called during a concurrent operation.
*
* Returns the OID of the created index.
*/
@@ -695,7 +701,8 @@ index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal)
+ bool is_internal,
+ bool is_reindex)
{
Oid heapRelationId = RelationGetRelid(heapRelation);
Relation pg_class;
@@ -738,19 +745,22 @@ index_create(Relation heapRelation,
/*
* concurrent index build on a system catalog is unsafe because we tend to
- * release locks before committing in catalogs
+ * release locks before committing in catalogs. If the index is created during
+ * a REINDEX CONCURRENTLY operation, sufficient locks are already taken.
*/
if (concurrent &&
- IsSystemRelation(heapRelation))
+ IsSystemRelation(heapRelation) &&
+ !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("concurrent index creation on system catalog tables is not supported")));
/*
- * This case is currently not supported, but there's no way to ask for it
- * in the grammar anyway, so it can't happen.
+ * This case is currently only supported during a concurrent index
+ * rebuild, but there is no way to ask for it in the grammar otherwise
+ * anyway.
*/
- if (concurrent && is_exclusion)
+ if (concurrent && is_exclusion && !is_reindex)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg_internal("concurrent index creation for exclusion constraints is not supported")));
@@ -1090,6 +1100,190 @@ index_create(Relation heapRelation,
return indexRelationId;
}
+
+/*
+ * index_concurrent_create
+ *
+ * Create an index based on the given one that will be used for concurrent
+ * operations. The index is inserted into catalogs and needs to be built later
+ * on. This is called during concurrent index processing. The heap relation
+ * on which is based the index needs to be closed by the caller.
+ */
+Oid
+index_concurrent_create(Relation heapRelation, Oid indOid, char *concurrentName)
+{
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ Oid concurrentOid = InvalidOid;
+ List *columnNames = NIL;
+ List *indexprs = NIL;
+ ListCell *indexpr_item;
+ int i;
+ HeapTuple indexTuple, classTuple;
+ Datum indclassDatum, colOptionDatum, optionDatum;
+ oidvector *indclass;
+ int2vector *indcoloptions;
+ bool isnull;
+ bool initdeferred = false;
+ Oid constraintOid = get_index_constraint(indOid);
+
+ indexRelation = index_open(indOid, RowExclusiveLock);
+
+ /* Concurrent index uses the same index information as former index */
+ indexInfo = BuildIndexInfo(indexRelation);
+
+ /*
+ * Determine if index is initdeferred, this depends on its dependent
+ * constraint.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ /* Look for the correct value */
+ HeapTuple constraintTuple;
+ Form_pg_constraint constraintForm;
+
+ constraintTuple = SearchSysCache1(CONSTROID,
+ ObjectIdGetDatum(constraintOid));
+ if (!HeapTupleIsValid(constraintTuple))
+ elog(ERROR, "cache lookup failed for constraint %u",
+ constraintOid);
+ constraintForm = (Form_pg_constraint) GETSTRUCT(constraintTuple);
+ initdeferred = constraintForm->condeferred;
+
+ ReleaseSysCache(constraintTuple);
+ }
+
+ /* Get expressions associated to this index for compilation of column names */
+ indexprs = RelationGetIndexExpressions(indexRelation);
+ indexpr_item = list_head(indexprs);
+
+ /* Build the list of column names, necessary for index_create */
+ for (i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {
+ char *origname, *curname;
+ char buf[NAMEDATALEN];
+ AttrNumber attnum = indexInfo->ii_KeyAttrNumbers[i];
+ int j;
+
+ /* Pick up column name depending on attribute type */
+ if (attnum > 0)
+ {
+ /*
+ * This is a column attribute, so simply pick column name from
+ * relation.
+ */
+ Form_pg_attribute attform = heapRelation->rd_att->attrs[attnum - 1];;
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else if (attnum < 0)
+ {
+ /* Case of a system attribute */
+ Form_pg_attribute attform = SystemAttributeDefinition(attnum,
+ heapRelation->rd_rel->relhasoids);
+ origname = pstrdup(NameStr(attform->attname));
+ }
+ else
+ {
+ Node *indnode;
+ /*
+ * This is the case of an expression, so pick up the expression
+ * name.
+ */
+ Assert(indexpr_item != NULL);
+ indnode = (Node *) lfirst(indexpr_item);
+ indexpr_item = lnext(indexpr_item);
+ origname = deparse_expression(indnode,
+ deparse_context_for(RelationGetRelationName(heapRelation),
+ RelationGetRelid(heapRelation)),
+ false, false);
+ }
+
+ /*
+ * Check if the name picked has any conflict with existing names and
+ * change it.
+ */
+ curname = origname;
+ for (j = 1;; j++)
+ {
+ ListCell *lc2;
+ char nbuf[32];
+ int nlen;
+
+ foreach(lc2, columnNames)
+ {
+ if (strcmp(curname, (char *) lfirst(lc2)) == 0)
+ break;
+ }
+ if (lc2 == NULL)
+ break; /* found nonconflicting name */
+
+ sprintf(nbuf, "%d", j);
+
+ /* Ensure generated names are shorter than NAMEDATALEN */
+ nlen = pg_mbcliplen(origname, strlen(origname),
+ NAMEDATALEN - 1 - strlen(nbuf));
+ memcpy(buf, origname, nlen);
+ strcpy(buf + nlen, nbuf);
+ curname = buf;
+ }
+
+ /* Append name to existing list */
+ columnNames = lappend(columnNames, pstrdup(curname));
+ }
+
+ /* Get the array of class and column options IDs from index info */
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indOid);
+ indclassDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indclass, &isnull);
+ Assert(!isnull);
+ indclass = (oidvector *) DatumGetPointer(indclassDatum);
+
+ colOptionDatum = SysCacheGetAttr(INDEXRELID, indexTuple,
+ Anum_pg_index_indoption, &isnull);
+ Assert(!isnull);
+ indcoloptions = (int2vector *) DatumGetPointer(colOptionDatum);
+
+ /* Fetch options of index if any */
+ classTuple = SearchSysCache1(RELOID, indOid);
+ if (!HeapTupleIsValid(classTuple))
+ elog(ERROR, "cache lookup failed for relation %u", indOid);
+ optionDatum = SysCacheGetAttr(RELOID, classTuple,
+ Anum_pg_class_reloptions, &isnull);
+
+ /* Now create the concurrent index */
+ concurrentOid = index_create(heapRelation,
+ (const char *) concurrentName,
+ InvalidOid,
+ InvalidOid,
+ indexInfo,
+ columnNames,
+ indexRelation->rd_rel->relam,
+ indexRelation->rd_rel->reltablespace,
+ indexRelation->rd_indcollation,
+ indclass->values,
+ indcoloptions->values,
+ optionDatum,
+ indexRelation->rd_index->indisprimary,
+ OidIsValid(constraintOid), /* is constraint? */
+ !indexRelation->rd_index->indimmediate, /* is deferrable? */
+ initdeferred, /* is initially deferred? */
+ true, /* allow table to be a system catalog? */
+ true, /* skip build? */
+ true, /* concurrent? */
+ false, /* is_internal */
+ true); /* reindex? */
+
+ /* Close the relations used and clean up */
+ index_close(indexRelation, NoLock);
+ ReleaseSysCache(indexTuple);
+ ReleaseSysCache(classTuple);
+
+ return concurrentOid;
+}
+
+
/*
* index_concurrent_build
*
@@ -1129,6 +1323,65 @@ index_concurrent_build(Oid heapOid,
index_close(indexRelation, NoLock);
}
+
+/*
+ * index_concurrent_swap
+ *
+ * Swap old index and new index in a concurrent context. For the time being
+ * what is done here is switching the relation relfilenode of the indexes. If
+ * extra operations are necessary during a concurrent swap, processing should
+ * be added here. Relations do not require an exclusive lock thanks to the
+ * MVCC catalog access to relcache.
+ */
+void
+index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid)
+{
+ Relation oldIndexRel, newIndexRel, pg_class;
+ HeapTuple oldIndexTuple, newIndexTuple;
+ Form_pg_class oldIndexForm, newIndexForm;
+ Oid tmpnode;
+
+ /*
+ * Take a necessary lock on the old and new index before swapping them.
+ */
+ oldIndexRel = relation_open(oldIndexOid, ShareUpdateExclusiveLock);
+ newIndexRel = relation_open(newIndexOid, ShareUpdateExclusiveLock);
+
+ /* Now swap relfilenode of those indexes */
+ pg_class = heap_open(RelationRelationId, RowExclusiveLock);
+
+ oldIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(oldIndexOid));
+ if (!HeapTupleIsValid(oldIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", oldIndexOid);
+ newIndexTuple = SearchSysCacheCopy1(RELOID,
+ ObjectIdGetDatum(newIndexOid));
+ if (!HeapTupleIsValid(newIndexTuple))
+ elog(ERROR, "could not find tuple for relation %u", newIndexOid);
+ oldIndexForm = (Form_pg_class) GETSTRUCT(oldIndexTuple);
+ newIndexForm = (Form_pg_class) GETSTRUCT(newIndexTuple);
+
+ /* Here is where the actual swap happens */
+ tmpnode = oldIndexForm->relfilenode;
+ oldIndexForm->relfilenode = newIndexForm->relfilenode;
+ newIndexForm->relfilenode = tmpnode;
+
+ /* Then update the tuples for each relation */
+ simple_heap_update(pg_class, &oldIndexTuple->t_self, oldIndexTuple);
+ simple_heap_update(pg_class, &newIndexTuple->t_self, newIndexTuple);
+ CatalogUpdateIndexes(pg_class, oldIndexTuple);
+ CatalogUpdateIndexes(pg_class, newIndexTuple);
+
+ /* Close relations and clean up */
+ heap_freetuple(oldIndexTuple);
+ heap_freetuple(newIndexTuple);
+ heap_close(pg_class, RowExclusiveLock);
+
+ /* The lock taken previously is not released until the end of transaction */
+ relation_close(oldIndexRel, NoLock);
+ relation_close(newIndexRel, NoLock);
+}
+
/*
* index_concurrent_set_dead
*
@@ -1187,6 +1440,71 @@ index_concurrent_set_dead(Oid heapOid, Oid indexOid, LOCKTAG locktag)
}
/*
+ * index_concurrent_drop
+ *
+ * Drop a single index concurrently as the last step of an index concurrent
+ * process. Deletion is done through performDeletion or dependencies of the
+ * index would not get dropped. At this point all the indexes are already
+ * considered as invalid and dead so they can be dropped without using any
+ * concurrent options as it is sure that they will not interact with other
+ * server sessions.
+ */
+void
+index_concurrent_drop(Oid indexOid)
+{
+ Oid constraintOid = get_index_constraint(indexOid);
+ ObjectAddress object;
+ Form_pg_index indexForm;
+ Relation pg_index;
+ HeapTuple indexTuple;
+
+ /*
+ * Check that the index dropped here is not alive, it might be used by
+ * other backends in this case.
+ */
+ pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+
+ indexTuple = SearchSysCacheCopy1(INDEXRELID,
+ ObjectIdGetDatum(indexOid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexOid);
+ indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ /*
+ * This is only a safety check, just to avoid live indexes from being
+ * dropped.
+ */
+ if (indexForm->indislive)
+ elog(ERROR, "cannot drop live index with OID %u", indexOid);
+
+ /* Clean up */
+ heap_close(pg_index, RowExclusiveLock);
+
+ /*
+ * We are sure to have a dead index, so begin the drop process.
+ * Register constraint or index for drop.
+ */
+ if (OidIsValid(constraintOid))
+ {
+ object.classId = ConstraintRelationId;
+ object.objectId = constraintOid;
+ }
+ else
+ {
+ object.classId = RelationRelationId;
+ object.objectId = indexOid;
+ }
+
+ object.objectSubId = 0;
+
+ /* Perform deletion for normal and toast indexes */
+ performDeletion(&object,
+ DROP_RESTRICT,
+ 0);
+}
+
+
+/*
* index_constraint_create
*
* Set up a constraint associated with an index
diff --git a/src/backend/catalog/toasting.c b/src/backend/catalog/toasting.c
index 385d64d..0c2971b 100644
--- a/src/backend/catalog/toasting.c
+++ b/src/backend/catalog/toasting.c
@@ -281,7 +281,7 @@ create_toast_table(Relation rel, Oid toastOid, Oid toastIndexOid, Datum reloptio
rel->rd_rel->reltablespace,
collationObjectId, classObjectId, coloptions, (Datum) 0,
true, false, false, false,
- true, false, false, true);
+ true, false, false, false, false);
heap_close(toast_rel, NoLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 3067639..488cf5d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -68,8 +68,9 @@ static void ComputeIndexAttrs(IndexInfo *indexInfo,
static Oid GetIndexOpClass(List *opclass, Oid attrType,
char *accessMethodName, Oid accessMethodId);
static char *ChooseIndexName(const char *tabname, Oid namespaceId,
- List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint);
+ List *colnames, List *exclusionOpNames,
+ bool primary, bool isconstraint,
+ bool concurrent);
static char *ChooseIndexNameAddition(List *colnames);
static List *ChooseIndexColumnNames(List *indexElems);
static void RangeVarCallbackForReindexIndex(const RangeVar *relation,
@@ -282,7 +283,7 @@ CheckIndexCompatible(Oid oldId,
* Wait for transactions that might have older snapshot than the given xmin
* limit, because it might not contain tuples deleted just before it has
* been taken. Obtain a list of VXIDs of such transactions, and wait for them
- * individually.
+ * individually or return a status error to caller if no wait is done.
*
* We can exclude any running transactions that have xmin > the xmin given;
* their oldest snapshot must be newer than our xmin limit.
@@ -529,7 +530,8 @@ DefineIndex(IndexStmt *stmt,
indexColNames,
stmt->excludeOpNames,
stmt->primary,
- stmt->isconstraint);
+ stmt->isconstraint,
+ false);
/*
* look up the access method, verify it can handle the requested features
@@ -676,7 +678,7 @@ DefineIndex(IndexStmt *stmt,
stmt->isconstraint, stmt->deferrable, stmt->initdeferred,
allowSystemTableMods,
skip_build || stmt->concurrent,
- stmt->concurrent, !check_rights);
+ stmt->concurrent, !check_rights, false);
/* Add any requested comment */
if (stmt->idxcomment != NULL)
@@ -859,6 +861,542 @@ DefineIndex(IndexStmt *stmt,
/*
+ * ReindexRelationConcurrently
+ *
+ * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
+ * either an index or a table. If a table is specified, each reindexing step
+ * is done in parallel with all the table's indexes as well as its dependent
+ * toast indexes.
+ */
+bool
+ReindexRelationConcurrently(Oid relationOid)
+{
+ List *concurrentIndexIds = NIL,
+ *indexIds = NIL,
+ *parentRelationIds = NIL,
+ *lockTags = NIL,
+ *relationLocks = NIL;
+ ListCell *lc, *lc2;
+ Snapshot snapshot;
+
+ /*
+ * Extract the list of indexes that are going to be rebuilt based on the
+ * list of relation Oids given by caller. For each element in given list,
+ * If the relkind of given relation Oid is a table, all its valid indexes
+ * will be rebuilt, including its associated toast table indexes. If
+ * relkind is an index, this index itself will be rebuilt. The locks taken
+ * parent relations and involved indexes are kept until this transaction
+ * is committed to protect against schema changes that might occur until
+ * the session lock is taken on each relation.
+ */
+ switch (get_rel_relkind(relationOid))
+ {
+ case RELKIND_RELATION:
+ case RELKIND_MATVIEW:
+ case RELKIND_TOASTVALUE:
+ {
+ /*
+ * In the case of a relation, find all its indexes
+ * including toast indexes.
+ */
+ Relation heapRelation;
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, relationOid);
+
+ /* A shared relation cannot be reindexed concurrently */
+ if (IsSharedRelation(relationOid))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for shared relations")));
+
+ /* A system catalog cannot be reindexed concurrently */
+ if (IsSystemNamespace(get_rel_namespace(relationOid)))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("concurrent reindex is not supported for catalog relations")));
+
+ /* Open relation to get its indexes */
+ heapRelation = heap_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Add all the valid indexes of relation to list */
+ foreach(lc2, RelationGetIndexList(heapRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ /* Also add the toast indexes */
+ if (OidIsValid(heapRelation->rd_rel->reltoastrelid))
+ {
+ Oid toastOid = heapRelation->rd_rel->reltoastrelid;
+ Relation toastRelation = heap_open(toastOid,
+ ShareUpdateExclusiveLock);
+
+ /* Track this relation for session locks */
+ parentRelationIds = lappend_oid(parentRelationIds, toastOid);
+
+ foreach(lc2, RelationGetIndexList(toastRelation))
+ {
+ Oid cellOid = lfirst_oid(lc2);
+ Relation indexRelation = index_open(cellOid,
+ ShareUpdateExclusiveLock);
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(cellOid)),
+ get_rel_name(cellOid))));
+ else
+ indexIds = lappend_oid(indexIds, cellOid);
+
+ index_close(indexRelation, NoLock);
+ }
+
+ heap_close(toastRelation, NoLock);
+ }
+
+ heap_close(heapRelation, NoLock);
+ break;
+ }
+ case RELKIND_INDEX:
+ {
+ /*
+ * For an index simply add its Oid to list. Invalid indexes
+ * cannot be included in list.
+ */
+ Relation indexRelation = index_open(relationOid, ShareUpdateExclusiveLock);
+
+ /* Track the parent relation of this index for session locks */
+ parentRelationIds = list_make1_oid(IndexGetRelation(relationOid, false));
+
+ if (!indexRelation->rd_index->indisvalid)
+ ereport(WARNING,
+ (errcode(ERRCODE_INDEX_CORRUPTED),
+ errmsg("cannot reindex concurrently invalid index \"%s.%s\", skipping",
+ get_namespace_name(get_rel_namespace(relationOid)),
+ get_rel_name(relationOid))));
+ else
+ indexIds = list_make1_oid(relationOid);
+
+ index_close(indexRelation, NoLock);
+ break;
+ }
+ default:
+ /* Return error if type of relation is not supported */
+ ereport(ERROR,
+ (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot reindex concurrently this type of relation")));
+ break;
+ }
+
+ /* Definetely no indexes, so leave */
+ if (indexIds == NIL)
+ return false;
+
+ Assert(parentRelationIds != NIL);
+
+ /*
+ * Phase 1 of REINDEX CONCURRENTLY
+ *
+ * Here begins the process for rebuilding concurrently the indexes.
+ * We need first to create an index which is based on the same data
+ * as the former index except that it will be only registered in catalogs
+ * and will be built after. It is possible to perform all the operations
+ * on all the indexes at the same time for a parent relation including
+ * its indexes for toast relation.
+ */
+
+ /* Do the concurrent index creation for each index */
+ foreach(lc, indexIds)
+ {
+ char *concurrentName;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = InvalidOid;
+ Relation indexRel,
+ indexParentRel,
+ indexConcurrentRel;
+ LockRelId lockrelid;
+
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ /* Open the index parent relation, might be a toast or parent relation */
+ indexParentRel = heap_open(indexRel->rd_index->indrelid,
+ ShareUpdateExclusiveLock);
+
+ /* Choose a relation name for concurrent index */
+ concurrentName = ChooseIndexName(get_rel_name(indOid),
+ get_rel_namespace(indexRel->rd_index->indrelid),
+ NULL,
+ false,
+ false,
+ false,
+ true);
+
+ /* Create concurrent index based on given index */
+ concurrentOid = index_concurrent_create(indexParentRel,
+ indOid,
+ concurrentName);
+
+ /*
+ * Now open the relation of concurrent index, a lock is also needed on
+ * it
+ */
+ indexConcurrentRel = index_open(concurrentOid, ShareUpdateExclusiveLock);
+
+ /* Save the concurrent index Oid */
+ concurrentIndexIds = lappend_oid(concurrentIndexIds, concurrentOid);
+
+ /*
+ * Save lockrelid to protect each concurrent relation from drop then
+ * close relations. The lockrelid on parent relation is not taken here
+ * to avoid multiple locks taken on the same relation, instead we rely
+ * on parentRelationIds built earlier.
+ */
+ lockrelid = indexRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+ lockrelid = indexConcurrentRel->rd_lockInfo.lockRelId;
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ index_close(indexRel, NoLock);
+ index_close(indexConcurrentRel, NoLock);
+ heap_close(indexParentRel, NoLock);
+ }
+
+ /*
+ * Save the heap lock for following visibility checks with other backends
+ * might conflict with this session.
+ */
+ foreach(lc, parentRelationIds)
+ {
+ Relation heapRelation = heap_open(lfirst_oid(lc), ShareUpdateExclusiveLock);
+ LockRelId lockrelid = heapRelation->rd_lockInfo.lockRelId;
+ LOCKTAG *heaplocktag = (LOCKTAG *) palloc(sizeof(LOCKTAG));
+
+ /* Add lockrelid of parent relation to the list of locked relations */
+ relationLocks = lappend(relationLocks, &lockrelid);
+
+ /* Save the LOCKTAG for this parent relation for the wait phase */
+ SET_LOCKTAG_RELATION(*heaplocktag, lockrelid.dbId, lockrelid.relId);
+ lockTags = lappend(lockTags, heaplocktag);
+
+ /* Close heap relation */
+ heap_close(heapRelation, NoLock);
+ }
+
+ /*
+ * For a concurrent build, it is necessary to make the catalog entries
+ * visible to the other transactions before actually building the index.
+ * This will prevent them from making incompatible HOT updates. The index
+ * is marked as not ready and invalid so as no other transactions will try
+ * to use it for INSERT or SELECT.
+ *
+ * Before committing, get a session level lock on the relation, the
+ * concurrent index and its copy to insure that none of them are dropped
+ * until the operation is done.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ LockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ PopActiveSnapshot();
+ CommitTransactionCommand();
+
+ /*
+ * Phase 2 of REINDEX CONCURRENTLY
+ *
+ * Build concurrent indexes in a separate transaction for each index to
+ * avoid having open transactions for an unnecessary long time. A
+ * concurrent build is done for each concurrent index that will replace
+ * the old indexes. Before doing that, we need to wait on the parent
+ * relations until no running transactions could have the parent table
+ * of index open.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForLockersMultiple(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Relation indexRel;
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ bool primary;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start new transaction for this index concurrent build */
+ StartTransactionCommand();
+
+ /* Set ActiveSnapshot since functions in the indexes may need it */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /* Index relation has been closed by previous commit, so reopen it */
+ indexRel = index_open(indOid, ShareUpdateExclusiveLock);
+ primary = indexRel->rd_index->indisprimary;
+ index_close(indexRel, ShareUpdateExclusiveLock);
+
+ /* Perform concurrent build of new index */
+ index_concurrent_build(indexRel->rd_index->indrelid,
+ concurrentOid,
+ primary);
+
+ /*
+ * Update the pg_index row of the concurrent index as ready for inserts.
+ * Once we commit this transaction, any new transactions that open the
+ * table must insert new entries into the index for insertions and
+ * non-HOT updates.
+ */
+ index_set_state_flags(concurrentOid, INDEX_CREATE_SET_READY);
+
+ /* we can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /*
+ * Commit this transaction to make the indisready update visible for
+ * concurrent index.
+ */
+ CommitTransactionCommand();
+ }
+
+
+ /*
+ * Phase 3 of REINDEX CONCURRENTLY
+ *
+ * During this phase the concurrent indexes catch up with the INSERT that
+ * might have occurred in the parent table.
+ *
+ * We once again wait until no transaction can have the table open with
+ * the index marked as read-only for updates. Each index validation is done
+ * with a separate transaction to avoid opening transaction for an
+ * unnecessary too long time.
+ */
+
+ /* Perform a wait on all the session locks */
+ StartTransactionCommand();
+ WaitForLockersMultiple(lockTags, ShareLock);
+ CommitTransactionCommand();
+
+ /*
+ * Perform a scan of each concurrent index with the heap, then insert
+ * any missing index entries.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ TransactionId limitXmin;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Open separate transaction to validate index */
+ StartTransactionCommand();
+
+ /* Get the parent relation Oid */
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Take the reference snapshot that will be used for the concurrent indexes
+ * validation.
+ */
+ snapshot = RegisterSnapshot(GetTransactionSnapshot());
+ PushActiveSnapshot(snapshot);
+
+ /* Validate index, which might be a toast */
+ validate_index(relOid, indOid, snapshot);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /*
+ * We can now do away with our active snapshot, we still need to save the xmin
+ * limit to wait for older snapshots.
+ */
+ limitXmin = snapshot->xmin;
+ PopActiveSnapshot();
+
+ /* And we can remove the validating snapshot too */
+ UnregisterSnapshot(snapshot);
+
+ /*
+ * This concurrent index is now valid as they contain all the tuples
+ * necessary. However, it might not have taken into account deleted tuples
+ * before the reference snapshot was taken, so we need to wait for the
+ * transactions that might have older snapshots than ours.
+ */
+ WaitForOlderSnapshots(limitXmin);
+
+ /* Commit this transaction to make the concurrent index valid */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 4 of REINDEX CONCURRENTLY
+ *
+ * Now that the concurrent indexes have been validated could be used,
+ * we need to swap each concurrent index with its corresponding old index.
+ * Note that the concurrent index used for swaping is not marked as valid
+ * because we need to keep the former index and the concurrent index with
+ * a different valid status to avoid an implosion in the number of indexes
+ * a parent relation could have if this operation fails multiple times in
+ * a row due to a reason or another. Note that we already know thanks to
+ * validation step that
+ */
+
+ /* Swap the indexes and mark the indexes that have the old data as invalid */
+ forboth(lc, indexIds, lc2, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid concurrentOid = lfirst_oid(lc2);
+ Oid relOid;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /*
+ * Each index needs to be swapped in a separate transaction, so start
+ * a new one.
+ */
+ StartTransactionCommand();
+
+ /* Swap old index and its concurrent */
+ index_concurrent_swap(concurrentOid, indOid);
+
+ /*
+ * Invalidate the relcache for the table, so that after this commit
+ * all sessions will refresh any cached plans that might reference the
+ * index.
+ */
+ relOid = IndexGetRelation(indOid, false);
+ CacheInvalidateRelcacheByRelid(relOid);
+
+ /* Commit this transaction and make old index invalidation visible */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 5 of REINDEX CONCURRENTLY
+ *
+ * The concurrent indexes now hold the old relfilenode of the other indexes
+ * transactions that might use them. Each operation is performed with a
+ * separate transaction.
+ */
+
+ /* Now mark the concurrent indexes as not ready */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indOid = lfirst_oid(lc);
+ Oid relOid;
+ LOCKTAG *heapLockTag = NULL;
+ ListCell *cell;
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ StartTransactionCommand();
+ relOid = IndexGetRelation(indOid, false);
+
+ /*
+ * Find the locktag of parent table for this index, we need to wait for
+ * locks on it.
+ */
+ foreach(cell, lockTags)
+ {
+ LOCKTAG *localTag = (LOCKTAG *) lfirst(cell);
+ if (relOid == localTag->locktag_field2)
+ heapLockTag = localTag;
+ }
+ Assert(heapLockTag && heapLockTag->locktag_field2 != InvalidOid);
+
+ /*
+ * Finish the index invalidation and set it as dead. Note that it is
+ * necessary to wait for for virtual locks on the parent relation
+ * before setting the index as dead.
+ */
+ index_concurrent_set_dead(relOid, indOid, *heapLockTag);
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Phase 6 of REINDEX CONCURRENTLY
+ *
+ * Drop the concurrent indexes. This needs to be done through
+ * performDeletion or related dependencies will not be dropped for the old
+ * indexes. The internal mechanism of DROP INDEX CONCURRENTLY is not used
+ * as here the indexes are already considered as dead and invalid, so they
+ * will not be used by other backends.
+ */
+ foreach(lc, concurrentIndexIds)
+ {
+ Oid indexOid = lfirst_oid(lc);
+
+ /* Check for any process interruption */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Start transaction to drop this index */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for next step */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ /*
+ * Open transaction if necessary, for the first index treated its
+ * transaction has been already opened previously.
+ */
+ index_concurrent_drop(indexOid);
+
+ /* We can do away with our snapshot */
+ PopActiveSnapshot();
+
+ /* Commit this transaction to make the update visible. */
+ CommitTransactionCommand();
+ }
+
+ /*
+ * Last thing to do is release the session-level lock on the parent table
+ * and the indexes of table.
+ */
+ foreach(lc, relationLocks)
+ {
+ LockRelId lockRel = * (LockRelId *) lfirst(lc);
+ UnlockRelationIdForSession(&lockRel, ShareUpdateExclusiveLock);
+ }
+
+ /* Start a new transaction to finish process properly */
+ StartTransactionCommand();
+
+ /* Get fresh snapshot for the end of process */
+ PushActiveSnapshot(GetTransactionSnapshot());
+
+ return true;
+}
+
+
+/*
* CheckMutability
* Test whether given expression is mutable
*/
@@ -1521,7 +2059,8 @@ ChooseRelationName(const char *name1, const char *name2,
static char *
ChooseIndexName(const char *tabname, Oid namespaceId,
List *colnames, List *exclusionOpNames,
- bool primary, bool isconstraint)
+ bool primary, bool isconstraint,
+ bool concurrent)
{
char *indexname;
@@ -1547,6 +2086,13 @@ ChooseIndexName(const char *tabname, Oid namespaceId,
"key",
namespaceId);
}
+ else if (concurrent)
+ {
+ indexname = ChooseRelationName(tabname,
+ NULL,
+ "cct",
+ namespaceId);
+ }
else
{
indexname = ChooseRelationName(tabname,
@@ -1659,18 +2205,22 @@ ChooseIndexColumnNames(List *indexElems)
* Recreate a specific index.
*/
Oid
-ReindexIndex(RangeVar *indexRelation)
+ReindexIndex(RangeVar *indexRelation, bool concurrent)
{
Oid indOid;
Oid heapOid = InvalidOid;
- /* lock level used here should match index lock reindex_index() */
- indOid = RangeVarGetRelidExtended(indexRelation, AccessExclusiveLock,
- false, false,
- RangeVarCallbackForReindexIndex,
- (void *) &heapOid);
+ indOid = RangeVarGetRelidExtended(indexRelation,
+ concurrent ? ShareUpdateExclusiveLock : AccessExclusiveLock,
+ concurrent, concurrent,
+ RangeVarCallbackForReindexIndex,
+ (void *) &heapOid);
- reindex_index(indOid, false);
+ /* Continue process for concurrent or non-concurrent case */
+ if (!concurrent)
+ reindex_index(indOid, false);
+ else
+ ReindexRelationConcurrently(indOid);
return indOid;
}
@@ -1739,13 +2289,27 @@ RangeVarCallbackForReindexIndex(const RangeVar *relation,
* Recreate all indexes of a table (and of its toast table, if any)
*/
Oid
-ReindexTable(RangeVar *relation)
+ReindexTable(RangeVar *relation, bool concurrent)
{
Oid heapOid;
/* The lock level used here should match reindex_relation(). */
- heapOid = RangeVarGetRelidExtended(relation, ShareLock, false, false,
- RangeVarCallbackOwnsTable, NULL);
+ heapOid = RangeVarGetRelidExtended(relation,
+ concurrent ? ShareUpdateExclusiveLock : ShareLock,
+ concurrent, concurrent,
+ RangeVarCallbackOwnsTable, NULL);
+
+ /* Run through the concurrent process if necessary */
+ if (concurrent)
+ {
+ if (!ReindexRelationConcurrently(heapOid))
+ {
+ ereport(NOTICE,
+ (errmsg("table \"%s\" has no indexes",
+ relation->relname)));
+ }
+ return heapOid;
+ }
if (!reindex_relation(heapOid,
REINDEX_REL_PROCESS_TOAST |
@@ -1766,7 +2330,10 @@ ReindexTable(RangeVar *relation)
* That means this must not be called within a user transaction block!
*/
Oid
-ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
+ReindexDatabase(const char *databaseName,
+ bool do_system,
+ bool do_user,
+ bool concurrent)
{
Relation relationRelation;
HeapScanDesc scan;
@@ -1778,6 +2345,15 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
AssertArg(databaseName);
+ /*
+ * CONCURRENTLY operation is not allowed for a system, but it is for a
+ * database.
+ */
+ if (concurrent && !do_user)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot reindex system concurrently")));
+
if (strcmp(databaseName, get_database_name(MyDatabaseId)) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
@@ -1861,17 +2437,42 @@ ReindexDatabase(const char *databaseName, bool do_system, bool do_user)
foreach(l, relids)
{
Oid relid = lfirst_oid(l);
+ bool result = false;
+ bool process_concurrent;
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
PushActiveSnapshot(GetTransactionSnapshot());
- if (reindex_relation(relid,
- REINDEX_REL_PROCESS_TOAST |
- REINDEX_REL_CHECK_CONSTRAINTS))
+
+ /* Determine if relation needs to be processed concurrently */
+ process_concurrent = concurrent &&
+ !IsSystemNamespace(get_rel_namespace(relid));
+
+ /*
+ * Reindex relation with a concurrent or non-concurrent process.
+ * System relations cannot be reindexed concurrently, but they
+ * need to be reindexed including pg_class with a normal process
+ * as they could be corrupted, and concurrent process might also
+ * use them. This does not include toast relations, which are
+ * reindexed when their parent relation is processed.
+ */
+ if (process_concurrent)
+ {
+ old = MemoryContextSwitchTo(private_context);
+ result = ReindexRelationConcurrently(relid);
+ MemoryContextSwitchTo(old);
+ }
+ else
+ result = reindex_relation(relid,
+ REINDEX_REL_PROCESS_TOAST |
+ REINDEX_REL_CHECK_CONSTRAINTS);
+
+ if (result)
ereport(NOTICE,
- (errmsg("table \"%s.%s\" was reindexed",
+ (errmsg("table \"%s.%s\" was reindexed%s",
get_namespace_name(get_rel_namespace(relid)),
- get_rel_name(relid))));
+ get_rel_name(relid),
+ process_concurrent ? " concurrently" : "")));
PopActiveSnapshot();
CommitTransactionCommand();
}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8839f98..400e7b4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -874,6 +874,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
char relkind;
Form_pg_class classform;
LOCKMODE heap_lockmode;
+ bool invalid_system_index = false;
state = (struct DropRelationCallbackState *) arg;
relkind = state->relkind;
@@ -909,7 +910,37 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
aclcheck_error(ACLCHECK_NOT_OWNER, ACL_KIND_CLASS,
rel->relname);
- if (!allowSystemTableMods && IsSystemClass(classform))
+ /*
+ * Check the case of a system index that might have been invalidated by a
+ * failed concurrent process and allow its drop. For the time being, this
+ * only concerns indexes of toast relations that became invalid during a
+ * REINDEX CONCURRENTLY process.
+ */
+ if (IsSystemClass(classform) &&
+ relkind == RELKIND_INDEX)
+ {
+ HeapTuple locTuple;
+ Form_pg_index indexform;
+ bool indisvalid;
+
+ locTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(relOid));
+ if (!HeapTupleIsValid(locTuple))
+ {
+ ReleaseSysCache(tuple);
+ return;
+ }
+
+ indexform = (Form_pg_index) GETSTRUCT(locTuple);
+ indisvalid = indexform->indisvalid;
+ ReleaseSysCache(locTuple);
+
+ /* Mark object as being an invalid index of system catalogs */
+ if (!indisvalid)
+ invalid_system_index = true;
+ }
+
+ /* In the case of an invalid index, it is fine to bypass this check */
+ if (!invalid_system_index && !allowSystemTableMods && IsSystemClass(classform))
ereport(ERROR,
(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
errmsg("permission denied: \"%s\" is a system catalog",
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index 39e3b2e..5495f22 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1201,6 +1201,20 @@ check_exclusion_constraint(Relation heap, Relation index, IndexInfo *indexInfo,
}
/*
+ * As an invalid index only exists when created in a concurrent context,
+ * and that this code path cannot be taken by CREATE INDEX CONCURRENTLY
+ * as this feature is not available for exclusion constraints, this code
+ * path can only be taken by REINDEX CONCURRENTLY. In this case the same
+ * index exists in parallel to this one so we can bypass this check as
+ * it has already been done on the other index existing in parallel.
+ * If exclusion constraints are supported in the future for CREATE INDEX
+ * CONCURRENTLY, this should be removed or completed especially for this
+ * purpose.
+ */
+ if (!index->rd_index->indisvalid)
+ return true;
+
+ /*
* Search the tuples that are in the index for any violations, including
* tuples that aren't visible yet.
*/
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 65f3b98..25324cd 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3640,6 +3640,7 @@ _copyReindexStmt(const ReindexStmt *from)
COPY_STRING_FIELD(name);
COPY_SCALAR_FIELD(do_system);
COPY_SCALAR_FIELD(do_user);
+ COPY_SCALAR_FIELD(concurrent);
return newnode;
}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 4c9b05e..d1e73fc 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1849,6 +1849,7 @@ _equalReindexStmt(const ReindexStmt *a, const ReindexStmt *b)
COMPARE_STRING_FIELD(name);
COMPARE_SCALAR_FIELD(do_system);
COMPARE_SCALAR_FIELD(do_user);
+ COMPARE_SCALAR_FIELD(concurrent);
return true;
}
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index a9812af..87e2a8b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6789,29 +6789,32 @@ opt_if_exists: IF_P EXISTS { $$ = TRUE; }
*****************************************************************************/
ReindexStmt:
- REINDEX reindex_type qualified_name opt_force
+ REINDEX reindex_type opt_concurrently qualified_name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = $2;
- n->relation = $3;
+ n->concurrent = $3;
+ n->relation = $4;
n->name = NULL;
$$ = (Node *)n;
}
- | REINDEX SYSTEM_P name opt_force
+ | REINDEX SYSTEM_P opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = false;
$$ = (Node *)n;
}
- | REINDEX DATABASE name opt_force
+ | REINDEX DATABASE opt_concurrently name opt_force
{
ReindexStmt *n = makeNode(ReindexStmt);
n->kind = OBJECT_DATABASE;
- n->name = $3;
+ n->concurrent = $3;
+ n->name = $4;
n->relation = NULL;
n->do_system = true;
n->do_user = true;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index b1023c4..1843386 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -778,16 +778,20 @@ standard_ProcessUtility(Node *parsetree,
{
ReindexStmt *stmt = (ReindexStmt *) parsetree;
+ if (stmt->concurrent)
+ PreventTransactionChain(isTopLevel,
+ "REINDEX CONCURRENTLY");
+
/* we choose to allow this during "read only" transactions */
PreventCommandDuringRecovery("REINDEX");
switch (stmt->kind)
{
case OBJECT_INDEX:
- ReindexIndex(stmt->relation);
+ ReindexIndex(stmt->relation, stmt->concurrent);
break;
case OBJECT_TABLE:
case OBJECT_MATVIEW:
- ReindexTable(stmt->relation);
+ ReindexTable(stmt->relation, stmt->concurrent);
break;
case OBJECT_DATABASE:
@@ -799,8 +803,8 @@ standard_ProcessUtility(Node *parsetree,
*/
PreventTransactionChain(isTopLevel,
"REINDEX DATABASE");
- ReindexDatabase(stmt->name,
- stmt->do_system, stmt->do_user);
+ ReindexDatabase(stmt->name, stmt->do_system,
+ stmt->do_user, stmt->concurrent);
break;
default:
elog(ERROR, "unrecognized object type: %d",
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9f29003..ab45c67 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -60,16 +60,25 @@ extern Oid index_create(Relation heapRelation,
bool allow_system_table_mods,
bool skip_build,
bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool is_reindex);
+
+extern Oid index_concurrent_create(Relation heapRelation,
+ Oid indOid,
+ char *concurrentName);
extern void index_concurrent_build(Oid heapOid,
Oid indexOid,
bool isprimary);
+extern void index_concurrent_swap(Oid newIndexOid, Oid oldIndexOid);
+
extern void index_concurrent_set_dead(Oid heapOid,
Oid indexOid,
LOCKTAG locktag);
+extern void index_concurrent_drop(Oid indexOid);
+
extern void index_constraint_create(Relation heapRelation,
Oid indexRelationId,
IndexInfo *indexInfo,
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 836c99e..d78a63e 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -27,10 +27,11 @@ extern Oid DefineIndex(IndexStmt *stmt,
bool check_rights,
bool skip_build,
bool quiet);
-extern Oid ReindexIndex(RangeVar *indexRelation);
-extern Oid ReindexTable(RangeVar *relation);
+extern Oid ReindexIndex(RangeVar *indexRelation, bool concurrent);
+extern Oid ReindexTable(RangeVar *relation, bool concurrent);
extern Oid ReindexDatabase(const char *databaseName,
- bool do_system, bool do_user);
+ bool do_system, bool do_user, bool concurrent);
+extern bool ReindexRelationConcurrently(Oid relOid);
extern char *makeObjectName(const char *name1, const char *name2,
const char *label);
extern char *ChooseRelationName(const char *name1, const char *name2,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 51fef68..4cde473 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2587,6 +2587,7 @@ typedef struct ReindexStmt
const char *name; /* name of database to reindex */
bool do_system; /* include system tables in database case */
bool do_user; /* include user tables in database case */
+ bool concurrent; /* reindex concurrently? */
} ReindexStmt;
/* ----------------------
diff --git a/src/test/isolation/expected/reindex-concurrently.out b/src/test/isolation/expected/reindex-concurrently.out
new file mode 100644
index 0000000..9e04169
--- /dev/null
+++ b/src/test/isolation/expected/reindex-concurrently.out
@@ -0,0 +1,78 @@
+Parsed test spec with 3 sessions
+
+starting permutation: reindex sel1 upd2 ins2 del2 end1 end2
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab;
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+
+starting permutation: sel1 reindex upd2 ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 reindex ins2 del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 reindex del2 end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 reindex end1 end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end1: COMMIT;
+step end2: COMMIT;
+step reindex: <... completed>
+
+starting permutation: sel1 upd2 ins2 del2 end1 reindex end2
+step sel1: SELECT data FROM reind_con_tab WHERE id = 3;
+data
+
+aaaa
+step upd2: UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3;
+step ins2: INSERT INTO reind_con_tab(data) VALUES ('cccc');
+step del2: DELETE FROM reind_con_tab WHERE data = 'cccc';
+step end1: COMMIT;
+step reindex: REINDEX TABLE CONCURRENTLY reind_con_tab; <waiting ...>
+step end2: COMMIT;
+step reindex: <... completed>
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 081e11f..fb4c1a9 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -20,4 +20,5 @@ test: delete-abort-savept
test: delete-abort-savept-2
test: aborted-keyrevoke
test: drop-index-concurrently-1
+test: reindex-concurrently
test: timeouts
diff --git a/src/test/isolation/specs/reindex-concurrently.spec b/src/test/isolation/specs/reindex-concurrently.spec
new file mode 100644
index 0000000..eb59fe0
--- /dev/null
+++ b/src/test/isolation/specs/reindex-concurrently.spec
@@ -0,0 +1,40 @@
+# REINDEX CONCURRENTLY
+#
+# Ensure that concurrent operations work correctly when a REINDEX is performed
+# concurrently.
+
+setup
+{
+ CREATE TABLE reind_con_tab(id serial primary key, data text);
+ INSERT INTO reind_con_tab(data) VALUES ('aa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaa');
+ INSERT INTO reind_con_tab(data) VALUES ('aaaaa');
+}
+
+teardown
+{
+ DROP TABLE reind_con_tab;
+}
+
+session "s1"
+setup { BEGIN; }
+step "sel1" { SELECT data FROM reind_con_tab WHERE id = 3; }
+step "end1" { COMMIT; }
+
+session "s2"
+setup { BEGIN; }
+step "upd2" { UPDATE reind_con_tab SET data = 'bbbb' WHERE id = 3; }
+step "ins2" { INSERT INTO reind_con_tab(data) VALUES ('cccc'); }
+step "del2" { DELETE FROM reind_con_tab WHERE data = 'cccc'; }
+step "end2" { COMMIT; }
+
+session "s3"
+step "reindex" { REINDEX TABLE CONCURRENTLY reind_con_tab; }
+
+permutation "reindex" "sel1" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "reindex" "upd2" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "reindex" "ins2" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "reindex" "del2" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "reindex" "end1" "end2"
+permutation "sel1" "upd2" "ins2" "del2" "end1" "reindex" "end2"
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 81c64e5..a613227 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2741,3 +2741,60 @@ ORDER BY thousand;
1 | 1001
(2 rows)
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+NOTICE: table "concur_reindex_tab" has no indexes
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+ERROR: REINDEX CONCURRENTLY cannot run inside a transaction block
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+ERROR: concurrent reindex is not supported for shared relations
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+ERROR: concurrent reindex is not supported for catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+ERROR: cannot reindex system concurrently
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+Table "public.concur_reindex_tab"
+ Column | Type | Modifiers
+--------+---------+-----------
+ c1 | integer | not null
+ c2 | text |
+Indexes:
+ "concur_reindex_ind1" PRIMARY KEY, btree (c1)
+ "concur_reindex_ind3" UNIQUE, btree (abs(c1))
+ "concur_reindex_ind2" btree (c2)
+ "concur_reindex_ind4" btree (c1, c1, c2)
+Referenced by:
+ TABLE "concur_reindex_tab2" CONSTRAINT "concur_reindex_tab2_c1_fkey" FOREIGN KEY (c1) REFERENCES concur_reindex_tab(c1)
+
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 4ee8581..aacfb58 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -915,3 +915,44 @@ ORDER BY thousand;
SELECT thousand, tenthous FROM tenk1
WHERE thousand < 2 AND tenthous IN (1001,3000)
ORDER BY thousand;
+
+--
+-- Check behavior of REINDEX and REINDEX CONCURRENTLY
+--
+CREATE TABLE concur_reindex_tab (c1 int);
+-- REINDEX
+REINDEX TABLE concur_reindex_tab; -- notice
+REINDEX TABLE CONCURRENTLY concur_reindex_tab; -- notice
+ALTER TABLE concur_reindex_tab ADD COLUMN c2 text; -- add toast index
+-- Normal index with integer column
+CREATE UNIQUE INDEX concur_reindex_ind1 ON concur_reindex_tab(c1);
+-- Normal index with text column
+CREATE INDEX concur_reindex_ind2 ON concur_reindex_tab(c2);
+-- UNIQUE index with expression
+CREATE UNIQUE INDEX concur_reindex_ind3 ON concur_reindex_tab(abs(c1));
+-- Duplicate column names
+CREATE INDEX concur_reindex_ind4 ON concur_reindex_tab(c1, c1, c2);
+-- Create table for check on foreign key dependence switch with indexes swapped
+ALTER TABLE concur_reindex_tab ADD PRIMARY KEY USING INDEX concur_reindex_ind1;
+CREATE TABLE concur_reindex_tab2 (c1 int REFERENCES concur_reindex_tab);
+INSERT INTO concur_reindex_tab VALUES (1, 'a');
+INSERT INTO concur_reindex_tab VALUES (2, 'a');
+-- Check materialized views
+CREATE MATERIALIZED VIEW concur_reindex_matview AS SELECT * FROM concur_reindex_tab;
+REINDEX INDEX CONCURRENTLY concur_reindex_ind1;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+REINDEX TABLE CONCURRENTLY concur_reindex_matview;
+
+-- Check errors
+-- Cannot run inside a transaction block
+BEGIN;
+REINDEX TABLE CONCURRENTLY concur_reindex_tab;
+COMMIT;
+REINDEX TABLE CONCURRENTLY pg_database; -- no shared relation
+REINDEX TABLE CONCURRENTLY pg_class; -- no catalog relations
+REINDEX SYSTEM CONCURRENTLY postgres; -- not allowed for SYSTEM
+
+-- Check the relation status, there should not be invalid indexes
+\d concur_reindex_tab
+DROP MATERIALIZED VIEW concur_reindex_matview;
+DROP TABLE concur_reindex_tab, concur_reindex_tab2;
Marking this patch as "returned with feedback", I will not be able to
work on that by the 15th of October. It would have been great to get
the infrastructure patches 0002 and 0003 committed to minimize the
work on the core patch, but well it is not the case.
I am attaching as well a patch fixing some comments of index_drop, as
mentioned by Andres in another thread, such as it doesn't get lost in
the flow.
Thanks to all for the involvement.
Regards,
--
Michael
Attachments:
20131002_index_drop_comments.patchapplication/octet-stream; name=20131002_index_drop_comments.patchDownload
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 826e504..41b7866 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1444,9 +1444,11 @@ index_drop(Oid indexId, bool concurrent)
/*
* Now we must wait until no running transaction could be using the
- * index for a query. Note we do not need to worry about xacts that
- * open the table for reading after this point; they will see the
- * index as invalid when they open the relation.
+ * index for a query. This is done with AccessExclusiveLock to check
+ * which running transaction has a lock of any kind on the table.
+ * Note we do not need to worry about xacts that open the table for
+ * reading after this point; they will see the index as invalid when
+ * they open the relation.
*
* Note: the reason we use actual lock acquisition here, rather than
* just checking the ProcArray and sleeping, is that deadlock is
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 2155252..c952bc3 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -651,9 +651,10 @@ DefineIndex(IndexStmt *stmt,
* for an overview of how this works)
*
* Now we must wait until no running transaction could have the table open
- * with the old list of indexes. Note we do not need to worry about xacts
- * that open the table for writing after this point; they will see the new
- * index when they open it.
+ * with the old list of indexes. This is done with ShareLock to check
+ * which running transaction holds a lock that permits writing the table.
+ * Note we do not need to worry about xacts that open the table for
+ * writing after this point; they will see the new index when they open it.
*
* Note: the reason we use actual lock acquisition here, rather than just
* checking the ProcArray and sleeping, is that deadlock is possible if
On 2013-10-02 13:16:06 +0900, Michael Paquier wrote:
Each patch applied with its parents compiles, has no warnings AFAIK
and passes regression/isolation tests. Working on 0004 by the end of
the CF seems out of the way IMO, so I'd suggest focusing on 0002 and
0003 now, and I can put some time to finalize them for this CF. I
think that we should perhaps split 0003 into 2 pieces, with one patch
for the introduction of index_concurrent_build, and another for
index_concurrent_set_dead. Comments are welcome about that though, and
if people agree on that I'll do it once 0002 is finalized.
FWIW I don't think splitting of index_concurrent_build is worthwile...
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers