Should I implement DROP INDEX CONCURRENTLY?

Started by Daniel Farinaover 14 years ago53 messages

daniel@heroku.com

over 14 years ago

Hello list,

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen). By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes. The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.

Quoth index.c:

/*
* To drop an index safely, we must grab exclusive lock on its parent
* table. Exclusive lock on the index alone is insufficient because
* another backend might be about to execute a query on the parent table.
* If it relies on a previously cached list of index OIDs, then it could
* attempt to access the just-dropped index. We must therefore take a
* table lock strong enough to prevent all queries on the table from
* proceeding until we commit and send out a shared-cache-inval notice
* that will make them update their index lists.
*/

Could I make the ACCESS EXCLUSIVE section just long enough to commit
catalog updates, and then have the bulk of the work happen afterwards?

The general idea is:

1) set an index as "invalid", to ensure no backend will use it in planning
2) wait for the xmin horizon to advance to ensure no open snapshots
that may not see the invalidation of the index are gone (is there a
way to tighten that up? although even this conservative version would
be 80-90% of the value for us...)
3) then use performDeletions without taking a lock on the parent
table, similar to what's in tablecmds.c already.

A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
instead of waiting for statement confirmation, just like CREATE INDEX
CONCURRENTLY.

--
fdr

mmoncure@gmail.com

over 14 years ago

In reply to: Daniel Farina (#1)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:

Hello list,

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen). By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes. The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

merlin

robertmhaas@gmail.com

over 14 years ago

In reply to: Daniel Farina (#1)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Aug 24, 2011 at 2:24 PM, Daniel Farina <daniel@heroku.com> wrote:

Could I make the ACCESS EXCLUSIVE section just long enough to commit
catalog updates, and then have the bulk of the work happen afterwards?

The general idea is:

1) set an index as "invalid", to ensure no backend will use it in planning
2) wait for the xmin horizon to advance to ensure no open snapshots
that may not see the invalidation of the index are gone (is there a
way to tighten that up? although even this conservative version would
be 80-90% of the value for us...)
3) then use performDeletions without taking a lock on the parent
table, similar to what's in tablecmds.c already.

A DROP INDEX CONCURRENTLY may leave an invalid index if aborted
instead of waiting for statement confirmation, just like CREATE INDEX
CONCURRENTLY.

This might be a dumb idea, but could we rearrange CommitTransaction()
so that smgrDoPendingDeletes() happens just a bit further down, after
those ResourceOwnerRelease() calls? It seems like that might
accomplish what you're trying to do here without needing a new
command.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tgl@sss.pgh.pa.us

over 14 years ago

In reply to: Merlin Moncure (#2)

Re: Should I implement DROP INDEX CONCURRENTLY?

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen).

Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

regards, tom lane

daniel@heroku.com

over 14 years ago

In reply to: Tom Lane (#4)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen).

Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.

--
fdr

daniel@heroku.com

over 14 years ago

In reply to: Daniel Farina (#5)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina <daniel@heroku.com> wrote:

On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.

This suspicion seems to be proven correct; there came an opportunity
where we were removing some indexes on a live system and I took the
opportunity to carefully control and time the process. There's not
much relationship between size of the index and the delay, but the
pauses are still very real. On the other hand, the first time this was
noticed there was significantly higher load.

I'd still like to do something to solve this problem, though: even if
the time-consuming part of the process is not file unlinking, it's
clearly something after the AccessExclusiveLock is acquired based on
our other measurements.

Back to the drawing board...

--
fdr

Peter Eisentraut

peter_e@gmx.net

about 14 years ago

In reply to: Daniel Farina (#1)

Re: Should I implement DROP INDEX CONCURRENTLY?

On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen). By non-trivial, I mean it can take 30+
seconds, but less than a couple of minutes. The storage layer
(starting from the higher levels of abstraction) are XFS, a somewhat
trivial lvm setup, mdraid (8-ways), Amazon EBS (NBD?).

I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.

Hmm, it seems I just independently came up with this same concept. My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up. If the table is
under constant load, you can't easily do that. So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.

simon@2ndQuadrant.com

about 14 years ago

In reply to: Peter Eisentraut (#7)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:

I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.

Hmm, it seems I just independently came up with this same concept. My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up. If the table is
under constant load, you can't easily do that. So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.

Here's a patch for this. Please review.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

drop_index_concurrently.v1.patchtext/x-patch; charset=US-ASCII; name=drop_index_concurrently.v1.patchDownload

diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..a5abb5b 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>
 </synopsis>
  </refsynopsisdiv>
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ..
    </varlistentry>
 
    <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      When this option is used, <productname>PostgreSQL</> will drop the
+      index without taking any locks that prevent concurrent selects, inserts,
+      updates, or deletes on the table; whereas a standard index drop
+      waits for a lock that locks out everything on the table until it's done.
+      Concurrent drop index is a two stage process. First, we mark the index
+      invalid and then commit the change. Next we wait until there are no
+      users locking the table who can see the invalid index.
+     </para>
+     <para>
+      There are several caveats to be aware of when using this option.
+      Only one index name can be specified if the <literal>CONCURRENTLY</literal>
+      parameter is specified. Only one concurrent index drop can occur on a
+      table at a time and no modifications on the table are allowed meanwhile.
+      Regular <command>DROP INDEX</> command can be performed within
+      a transaction block, but <command>DROP INDEX CONCURRENTLY</> cannot.
+      There is no CASCADE option when dropping an index concurrently.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="PARAMETER">name</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 223c097..656af21 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -171,8 +171,9 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 					   DropBehavior behavior,
 					   int msglevel,
 					   const ObjectAddress *origObject);
-static void deleteOneObject(const ObjectAddress *object, Relation depRel);
-static void doDeletion(const ObjectAddress *object);
+static void deleteOneObject(const ObjectAddress *object, Relation depRel,
+							int option_flags);
+static void doDeletion(const ObjectAddress *object, int option_flags);
 static void AcquireDeletionLock(const ObjectAddress *object);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
@@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -276,6 +277,13 @@ void
 performMultipleDeletions(const ObjectAddresses *objects,
 						 DropBehavior behavior)
 {
+	performMultipleDeletionsWithOptions(objects, behavior, 0);
+}
+
+void
+performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+						 DropBehavior behavior, int option_flags)
+{
 	Relation	depRel;
 	ObjectAddresses *targetObjects;
 	int			i;
@@ -336,13 +344,17 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, option_flags);
 	}
 
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier if doing a drop concurrently
+	 */
+	if ((option_flags & OPT_CONCURRENTLY) != OPT_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -407,7 +419,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 		if (thisextra->flags & DEPFLAG_ORIGINAL)
 			continue;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -950,7 +962,7 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
  * depRel is the already-open pg_depend relation.
  */
 static void
-deleteOneObject(const ObjectAddress *object, Relation depRel)
+deleteOneObject(const ObjectAddress *object, Relation depRel, int option_flags)
 {
 	ScanKeyData key[3];
 	int			nkeys;
@@ -1001,9 +1013,16 @@ deleteOneObject(const ObjectAddress *object, Relation depRel)
 									 object->objectSubId);
 
 	/*
+	 * Close depRel if we are doing a drop concurrently because it
+	 * commits the transaction, so we don't want dangling references.
+	 */
+	if ((option_flags & OPT_CONCURRENTLY) == OPT_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
+
+	/*
 	 * Now delete the object itself, in an object-type-dependent way.
 	 */
-	doDeletion(object);
+	doDeletion(object, option_flags);
 
 	/*
 	 * Delete any comments or security labels associated with this object.
@@ -1028,7 +1047,7 @@ deleteOneObject(const ObjectAddress *object, Relation depRel)
  * doDeletion: actually delete a single object
  */
 static void
-doDeletion(const ObjectAddress *object)
+doDeletion(const ObjectAddress *object, int option_flags)
 {
 	switch (getObjectClass(object))
 	{
@@ -1038,8 +1057,11 @@ doDeletion(const ObjectAddress *object)
 
 				if (relKind == RELKIND_INDEX)
 				{
+					bool concurrent = ((option_flags & OPT_CONCURRENTLY)
+													== OPT_CONCURRENTLY);
+
 					Assert(object->objectSubId == 0);
-					index_drop(object->objectId);
+					index_drop(object->objectId, concurrent);
 				}
 				else
 				{
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index f9075c4..b0ddd18 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1282,7 +1282,7 @@ index_constraint_create(Relation heapRelation,
  * else associated dependencies won't be cleaned up.
  */
 void
-index_drop(Oid indexId)
+index_drop(Oid indexId, bool concurrent)
 {
 	Oid			heapId;
 	Relation	userHeapRelation;
@@ -1290,6 +1290,12 @@ index_drop(Oid indexId)
 	Relation	indexRelation;
 	HeapTuple	tuple;
 	bool		hasexprs;
+	LockRelId	heaprelid,
+				indexrelid;
+	LOCKTAG		heaplocktag,
+				indexlocktag;
+	VirtualTransactionId *old_lockholders;
+	Form_pg_index indexForm;
 
 	/*
 	 * To drop an index safely, we must grab exclusive lock on its parent
@@ -1302,9 +1308,16 @@ index_drop(Oid indexId)
 	 * that will make them update their index lists.
 	 */
 	heapId = IndexGetRelation(indexId, false);
-	userHeapRelation = heap_open(heapId, AccessExclusiveLock);
-
-	userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	if (concurrent)
+	{
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+	}
+	else
+	{
+		userHeapRelation = heap_open(heapId, AccessExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
 
 	/*
 	 * There can no longer be anyone *else* touching the index, but we might
@@ -1313,6 +1326,100 @@ index_drop(Oid indexId)
 	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
 
 	/*
+	 * Drop Index concurrently is similar in many ways to creating an
+	 * index concurrently, so some actions are similar to DefineIndex()
+	 */
+	if (concurrent)
+	{
+		/*
+		 * Mark index invalid by updating its pg_index entry
+		 *
+		 * Don't Assert(indexForm->indisvalid) because we may be trying to
+		 * clear up after an error when trying to create an index which left
+		 * the index invalid
+		 */
+		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
+
+		tuple = SearchSysCacheCopy1(INDEXRELID,
+										 ObjectIdGetDatum(indexId));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for index %u", indexId);
+		indexForm = (Form_pg_index) GETSTRUCT(tuple);
+
+		indexForm->indisvalid = false;
+
+		simple_heap_update(indexRelation, &tuple->t_self, tuple);
+		CatalogUpdateIndexes(indexRelation, tuple);
+
+		heap_close(indexRelation, RowExclusiveLock);
+
+		/* save lockrelid and locktag for below, then close but keep locks */
+		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
+		heap_close(userHeapRelation, NoLock);
+
+		indexrelid = userIndexRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(indexlocktag, indexrelid.dbId, indexrelid.relId);
+		index_close(userIndexRelation, NoLock);
+
+		/*
+		 * For a concurrent drop, it's important to make the catalog entries
+		 * visible to other transactions before we drop the index. The index
+		 * will be marked not indisvalid, so that no one else tries to either
+		 * insert into it or use it for queries.
+		 *
+		 * We must commit our current transaction so that the index update becomes
+		 * visible; then start another.  Note that all the data structures we just
+		 * built are lost in the commit.  The only data we keep past here are the
+		 * relation IDs.
+		 *
+		 * Before committing, get a session-level lock on the table, to ensure
+		 * that neither it nor the index can be dropped before we finish. This
+		 * cannot block, even if someone else is waiting for access, because we
+		 * already have the same lock within our transaction.
+		 */
+		LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		LockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+		StartTransactionCommand();
+
+		/*
+		 * Now we must wait until no running transaction could have the table open
+		 * with the old list of indexes.  To do this, inquire which xacts
+		 * currently would conflict with AccessExclusiveLock on the table -- ie,
+		 * which ones have a lock of any kind on the table.	Then wait for each of
+		 * these xacts to commit or abort.	Note we do not need to worry about
+		 * xacts that open the table for writing after this point; they will see
+		 * the index as invalid when they open the relation.
+		 *
+		 * Note: the reason we use actual lock acquisition here, rather than just
+		 * checking the ProcArray and sleeping, is that deadlock is possible if
+		 * one of the transactions in question is blocked trying to acquire an
+		 * exclusive lock on our table.  The lock code will detect deadlock and
+		 * error out properly.
+		 *
+		 * Note: GetLockConflicts() never reports our own xid, hence we need not
+		 * check for that.	Also, prepared xacts are not reported, which is fine
+		 * since they certainly aren't going to do anything more.
+		 */
+		old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
+
+		while (VirtualTransactionIdIsValid(*old_lockholders))
+		{
+			VirtualXactLock(*old_lockholders, true);
+			old_lockholders++;
+		}
+
+		/*
+		 * Re-open relations to allow us to complete our actions
+		 */
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+	}
+
+	/*
 	 * All predicate locks on the index are about to be made invalid. Promote
 	 * them to relation locks on the heap.
 	 */
@@ -1378,6 +1485,15 @@ index_drop(Oid indexId)
 	 * Close owning rel, but keep lock
 	 */
 	heap_close(userHeapRelation, NoLock);
+
+	/*
+	 * Release the session locks before we go.
+	 */
+	if (concurrent)
+	{
+		UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		UnlockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+	}
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 4953def..7a15ea4 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -239,6 +239,7 @@ struct DropRelationCallbackState
 {
 	char	relkind;
 	Oid		heapOid;
+	bool	concurrent;
 };
 
 /* Alter table target-type flags for ATSimplePermissions */
@@ -730,6 +731,7 @@ RemoveRelations(DropStmt *drop)
 	ObjectAddresses *objects;
 	char		relkind;
 	ListCell   *cell;
+	int			option_flags = 0;
 
 	/*
 	 * First we identify all the relations, then we delete them in a single
@@ -792,7 +794,14 @@ RemoveRelations(DropStmt *drop)
 		/* Look up the appropriate relation using namespace search. */
 		state.relkind = relkind;
 		state.heapOid = InvalidOid;
-		relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
+		state.concurrent = drop->concurrent;
+		if (drop->concurrent)
+			relOid = RangeVarGetRelidExtended(rel, ShareUpdateExclusiveLock, true,
+										  false,
+										  RangeVarCallbackForDropRelation,
+										  (void *) &state);
+		else
+			relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
 										  false,
 										  RangeVarCallbackForDropRelation,
 										  (void *) &state);
@@ -812,7 +821,20 @@ RemoveRelations(DropStmt *drop)
 		add_exact_object_address(&obj, objects);
 	}
 
-	performMultipleDeletions(objects, drop->behavior);
+	/*
+	 * Set options and check further requirements for concurrent drop
+	 */
+	if (drop->concurrent)
+	{
+		/*
+		 * Confirm that concurrent behaviour is restricted in grammar.
+		 */
+		Assert(drop->removeType == OBJECT_INDEX);
+
+		option_flags |= OPT_CONCURRENTLY;
+	}
+
+	performMultipleDeletionsWithOptions(objects, drop->behavior, option_flags);
 
 	free_object_addresses(objects);
 }
@@ -881,7 +903,8 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	{
 		state->heapOid = IndexGetRelation(relOid, true);
 		if (OidIsValid(state->heapOid))
-			LockRelationOid(state->heapOid, AccessExclusiveLock);
+			LockRelationOid(state->heapOid,
+				(state->concurrent ? ShareUpdateExclusiveLock: AccessExclusiveLock));
 	}
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index dd2dd25..025a6c1 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2761,6 +2761,7 @@ _copyDropStmt(const DropStmt *from)
 	COPY_SCALAR_FIELD(removeType);
 	COPY_SCALAR_FIELD(behavior);
 	COPY_SCALAR_FIELD(missing_ok);
+	COPY_SCALAR_FIELD(concurrent);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index c2fdb2b..42ecaac 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1187,6 +1187,7 @@ _equalDropStmt(const DropStmt *a, const DropStmt *b)
 	COMPARE_SCALAR_FIELD(removeType);
 	COMPARE_SCALAR_FIELD(behavior);
 	COMPARE_SCALAR_FIELD(missing_ok);
+	COMPARE_SCALAR_FIELD(concurrent);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 8943c5b..35f786d 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -213,7 +213,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 		CreatedbStmt DeclareCursorStmt DefineStmt DeleteStmt DiscardStmt DoStmt
 		DropGroupStmt DropOpClassStmt DropOpFamilyStmt DropPLangStmt DropStmt
 		DropAssertStmt DropTrigStmt DropRuleStmt DropCastStmt DropRoleStmt
-		DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt
+		DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt DropIndexStmt
 		DropForeignServerStmt DropUserMappingStmt ExplainStmt FetchStmt
 		GrantStmt GrantRoleStmt IndexStmt InsertStmt ListenStmt LoadStmt
 		LockStmt NotifyStmt ExplainableStmt PreparableStmt
@@ -307,7 +307,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 				opt_column_list columnList opt_name_list
 				sort_clause opt_sort_clause sortby_list index_params
 				name_list from_clause from_list opt_array_bounds
-				qualified_name_list any_name any_name_list
+				qualified_name_list any_name any_name_list any_name_single
 				any_operator expr_list attrs
 				target_list insert_column_list set_target_list
 				set_clause_list set_clause multiple_set_clause
@@ -741,6 +741,7 @@ stmt :
 			| DropFdwStmt
 			| DropForeignServerStmt
 			| DropGroupStmt
+			| DropIndexStmt
 			| DropOpClassStmt
 			| DropOpFamilyStmt
 			| DropOwnedStmt
@@ -4751,7 +4752,6 @@ DropStmt:	DROP drop_type IF_P EXISTS any_name_list opt_drop_behavior
 drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| SEQUENCE								{ $$ = OBJECT_SEQUENCE; }
 			| VIEW									{ $$ = OBJECT_VIEW; }
-			| INDEX									{ $$ = OBJECT_INDEX; }
 			| FOREIGN TABLE							{ $$ = OBJECT_FOREIGN_TABLE; }
 			| TYPE_P								{ $$ = OBJECT_TYPE; }
 			| DOMAIN_P								{ $$ = OBJECT_DOMAIN; }
@@ -4780,6 +4780,44 @@ attrs:		'.' attr_name
 					{ $$ = lappend($1, makeString($3)); }
 		;
 
+any_name_single:
+			any_name								{ $$ = list_make1($1); }
+		;
+
+DropIndexStmt:	DROP INDEX IF_P EXISTS any_name_list
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = TRUE;
+					n->objects = $5;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX any_name_list
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $3;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX CONCURRENTLY any_name_single
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $4;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = true;
+					$$ = (Node *)n;
+				}
+		;
 
 /*****************************************************************************
  *
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 36d9edc..61337b8 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -631,10 +631,15 @@ standard_ProcessUtility(Node *parsetree,
 		case T_DropStmt:
 			switch (((DropStmt *) parsetree)->removeType)
 			{
+				case OBJECT_INDEX:
+					if (((DropStmt *) parsetree)->concurrent)
+						PreventTransactionChain(isTopLevel,
+											"DROP INDEX CONCURRENTLY");
+					/* fall through */
+
 				case OBJECT_TABLE:
 				case OBJECT_SEQUENCE:
 				case OBJECT_VIEW:
-				case OBJECT_INDEX:
 				case OBJECT_FOREIGN_TABLE:
 					RemoveRelations((DropStmt *) parsetree);
 					break;
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 5302e50..77a1ede 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -158,6 +158,11 @@ extern void performDeletion(const ObjectAddress *object,
 extern void performMultipleDeletions(const ObjectAddresses *objects,
 						 DropBehavior behavior);
 
+extern void performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+						 DropBehavior behavior, int option_flags);
+/* options specified for object deletion */
+#define OPT_CONCURRENTLY	0x01
+
 extern void deleteWhatDependsOn(const ObjectAddress *object,
 					bool showNotices);
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 9efd9af..8916980 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -63,7 +63,7 @@ extern void index_constraint_create(Relation heapRelation,
 						bool update_pgindex,
 						bool allow_system_table_mods);
 
-extern void index_drop(Oid indexId);
+extern void index_drop(Oid indexId, bool concurrent);
 
 extern IndexInfo *BuildIndexInfo(Relation index);
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 6e8b110..67612f6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1905,6 +1905,7 @@ typedef struct DropStmt
 	ObjectType	removeType;		/* object type */
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 	bool		missing_ok;		/* skip error if object is missing? */
+	bool		concurrent;		/* drop index concurrently? */
 } DropStmt;
 
 /* ----------------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 18457e0..c0697e5 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2204,6 +2204,38 @@ Indexes:
     "concur_index5" btree (f2) WHERE f1 = 'x'::text
     "std_index" btree (f2)
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";
+-- failures
+DROP INDEX IF EXISTS CONCURRENTLY;
+ERROR:  syntax error at or near "CONCURRENTLY"
+LINE 1: DROP INDEX IF EXISTS CONCURRENTLY;
+                             ^
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+ERROR:  syntax error at or near ","
+LINE 1: DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+                                               ^
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ERROR:  DROP INDEX CONCURRENTLY cannot run inside a transaction block
+ROLLBACK;
+-- successes
+DROP INDEX CONCURRENTLY "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+\d concur_heap
+Table "public.concur_heap"
+ Column | Type | Modifiers 
+--------+------+-----------
+ f1     | text | 
+ f2     | text | 
+Indexes:
+    "std_index" btree (f2)
+
 DROP TABLE concur_heap;
 --
 -- Test ADD CONSTRAINT USING INDEX
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 8c60cb6..892f04b 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -695,6 +695,27 @@ COMMIT;
 
 \d concur_heap
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";
+
+-- failures
+DROP INDEX IF EXISTS CONCURRENTLY;
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ROLLBACK;
+
+-- successes
+DROP INDEX CONCURRENTLY "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+
+\d concur_heap
+
 DROP TABLE concur_heap;
 
 --

simon@2ndQuadrant.com

about 14 years ago

In reply to: Daniel Farina (#6)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Fri, Sep 9, 2011 at 11:02 PM, Daniel Farina <daniel@heroku.com> wrote:

On Wed, Aug 24, 2011 at 1:04 PM, Daniel Farina <daniel@heroku.com> wrote:

On Wed, Aug 24, 2011 at 12:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be
addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

Alright, since this concern about confirming the expensive part of
index dropping has come up a few times but otherwise the waters are
warm, I'll go ahead and do some work to pin things down a bit before
we continue working on those assumptions.

This suspicion seems to be proven correct; there came an opportunity
where we were removing some indexes on a live system and I took the
opportunity to carefully control and time the process. There's not
much relationship between size of the index and the delay, but the
pauses are still very real. On the other hand, the first time this was
noticed there was significantly higher load.

I'd still like to do something to solve this problem, though: even if
the time-consuming part of the process is not file unlinking, it's
clearly something after the AccessExclusiveLock is acquired based on
our other measurements.

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

strategyfreebuffer.v1.patchtext/x-patch; charset=US-ASCII; name=strategyfreebuffer.v1.patchDownload

diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..af7215f 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -218,12 +218,21 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 }
 
 /*
- * StrategyFreeBuffer: put a buffer on the freelist
+ * StrategyFreeBuffer: put a buffer on the freelist, unless we're busy
  */
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
-	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
+	/*
+	 * The buffer is already invalidated and is now an allocation target.
+	 * Adding buffers back onto the freelist is an optimisation only,
+	 * so we can decide to skip this step if the lock is busy.
+	 * This improves the speed of dropping indexes and tables on a busy system.
+	 * If the system is busy the newly invalidated buffers will be reallocated
+	 * within one clock sweep.
+	 */
+	if (!LWLockConditionalAcquire(BufFreelistLock, LW_EXCLUSIVE))
+		return;
 
 	/*
 	 * It is possible that we are told to put something in the freelist that

jim@nasby.net

about 14 years ago

In reply to: Simon Riggs (#9)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

On top of that, taking what Robert Haas mentioned on another thread,
InvalidateBuffer currently calls StretegyFreeBuffer(), which waits for
an ExclusiveLock on the BufFreelistLock. On a busy system this will be
heavily contended, so adding blocks to the freelist only if the lock
is free seems warranted.

Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

In fact, IIRC the function that scans for buffers actually checks to see if a rel still exists before it returns the buffer...
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

tgl@sss.pgh.pa.us

about 14 years ago

In reply to: Jim Nasby (#10)

Re: Should I implement DROP INDEX CONCURRENTLY?

Jim Nasby <jim@nasby.net> writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist. Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago. Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.

(BTW, do we have a separate clock sweep hand for each backend? If not,
there might be some low hanging fruit there.)

regards, tom lane

jim@nasby.net

about 14 years ago

In reply to: Tom Lane (#11)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Jan 3, 2012, at 5:28 PM, Tom Lane wrote:

Jim Nasby <jim@nasby.net> writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist. Simon is after a different solution involving getting
rid of the clock sweep, but he has failed to explain how that's not
going to end up being the same type of contention-prone coding that we
got rid of by adopting the clock sweep, some years ago. Yeah, the sweep
takes a lot of spinlocks, but that only matters if there is contention
for them, and the sweep approach avoids the need for a centralized data
structure.

Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).

Heh, it occurs to me that the SQL analogy for how things work right now is that backends currently have to run a SeqScan (or 5) to find a free page... what we need to do is CREATE INDEX free ON buffers(buffer_id) WHERE count = 0;.

(BTW, do we have a separate clock sweep hand for each backend? If not,
there might be some low hanging fruit there.)

No... having multiple clock hands is an interesting idea, but I'm worried that it could potentially get us into trouble if scores of backends were suddenly decrementing usage counts all over the place. For example, what if 5 backends all had their hands in basically the same place, all pointing at a very heavily used buffer. All 5 backends go for free space, they each grab the spinlock on that buffer in succession and suddenly this highly used buffer that started with a count of 5 has now been freed. We could potentially use more than one hand, but I think the relation between the number of hands and the maximum usage count has to be tightly controlled.
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

tgl@sss.pgh.pa.us

about 14 years ago

In reply to: Jim Nasby (#12)

Re: Should I implement DROP INDEX CONCURRENTLY?

Jim Nasby <jim@nasby.net> writes:

Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).

Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.

Still, he who wants to do the work can try whatever approach he feels
like.

regards, tom lane

robertmhaas@gmail.com

about 14 years ago

In reply to: Tom Lane (#13)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jim Nasby <jim@nasby.net> writes:

Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).

Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.

Still, he who wants to do the work can try whatever approach he feels
like.

It might be possible to partition the free list. So imagine, for
example, 8 free lists. The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list. Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it. If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again. This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.

Of course you have to think about what happens if you find that your
chosen free list is empty. In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground. You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

jim@nasby.net

about 14 years ago

In reply to: Robert Haas (#14)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Jan 3, 2012, at 7:34 PM, Robert Haas wrote:

On Tue, Jan 3, 2012 at 7:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jim Nasby <jim@nasby.net> writes:

Yeah, but the problem we run into is that with every backend trying to run the clock on it's own we end up with high contention again... it's just in a different place than when we had a true LRU. The clock sweep might be cheaper than the linked list was, but it's still awfully expensive. I believe our best bet is to have a free list that is actually useful in normal operations, and then optimize the cost of pulling buffers out of that list as much as possible (and let the bgwriter deal with keeping enough pages in that list to satisfy demand).

Well, maybe, but I think the historical evidence suggests that that
approach will be a loser, simply because no matter how cheap, the
freelist will remain a centralized and heavily contended data structure.
IMO we need to be looking for a mechanism that has no single point of
contention, and modifying the clock sweep rules looks like the best way
to get there.

Still, he who wants to do the work can try whatever approach he feels
like.

It might be possible to partition the free list. So imagine, for
example, 8 free lists. The background writer runs the clock sweep and
finds some buffers that are about ready to be reallocated and
distributes one-eighth of them to each free list. Then, when a
backend wants to allocate a buffer, it picks a free list somehow
(round robin?) and pulls a buffer off it. If the buffer turns out to
have been used since it was added to the free list, we give up on it
and try again. This hopefully shouldn't happen too often, though, as
long as we only add enough buffers to the free list to satisfy the
requests that we expect to get over the next
small-fraction-of-a-second.

Of course you have to think about what happens if you find that your
chosen free list is empty. In that case you probably want to try a
different one, and if in the worst case where they're all empty, run
the clock sweep in the foreground. You probably also want to kick the
background writer in the pants if even a single one is empty (or
nearly empty?) and tell it to hurry up and add some more buffers to
the freelist.

If it comes down to it, we can look at partitioning the free list. But here's the thing: this is the strategy that FreeBSD (and I think now Linux as well) use to service memory requests, be they for free memory or for reading data from disk. If it's good enough for an OS to use, I would expect we could make it work as well.

I would expect that pulling a page off the free list would be an extremely fast operation... lock the read lock on the list (we should probably have separate locks for putting stuff on the list vs taking it off), pin the buffer (which shouldn't be contentious), update the read pointer and drop the read lock. Even in C code that's probably less than 100 machine instructions, and no looping. Compared to the cost of running a clock sweep it should be significantly faster. So while there will be contention on the free list read lock, none of that contention should last very long.

Do we have any other places where we take extremely short locks like this and still run into problems?
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

simon@2ndquadrant.com

about 14 years ago

In reply to: Tom Lane (#11)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Jim Nasby <jim@nasby.net> writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

Simon is after a different solution involving getting
rid of the clock sweep...

err, No, he isn't. Not sure where that came from since I'm advocating
only minor changes there to curb worst case behaviour.

But lets discuss that on the main freelist thread.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

removebufmgrfreelist.v1.patchtext/x-patch; charset=US-ASCII; name=removebufmgrfreelist.v1.patchDownload

diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf->freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf->io_in_progress_lock = LWLockAssign();
 			buf->content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl->numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the "clock sweep" algorithm */
+	/* Run the "clock sweep" algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -182,20 +185,25 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		 * If the buffer is pinned or has a nonzero usage_count, we cannot use
 		 * it; decrement the usage_count (unless pinned) and keep scanning.
 		 */
-		LockBufHdr(buf);
 		if (buf->refcount == 0)
 		{
-			if (buf->usage_count > 0)
+			if (buf->usage_count > StrategyControl->completePasses)
 			{
 				buf->usage_count--;
 				trycounter = NBuffers;
 			}
 			else
 			{
-				/* Found a usable buffer */
-				if (strategy != NULL)
-					AddBufferToRing(strategy, buf);
-				return buf;
+				LockBufHdr(buf);
+				if (buf->refcount == 0)
+				{
+					UnlockBufHdr(buf);
+					/* Found a usable buffer */
+					if (strategy != NULL)
+						AddBufferToRing(strategy, buf);
+					return buf;
+				}
+				UnlockBufHdr(buf);
 			}
 		}
 		else if (--trycounter == 0)
@@ -207,10 +215,8 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 			 * probably better to fail than to risk getting stuck in an
 			 * infinite loop.
 			 */
-			UnlockBufHdr(buf);
 			elog(ERROR, "no unpinned buffers available");
 		}
-		UnlockBufHdr(buf);
 	}
 
 	/* not reached */
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void
+StrategyInitFreelistBuffer(volatile BufferDesc *buf)
+{
+#ifdef USE_BUFMGR_FREELIST
+	/*
+	 * Initially link all the buffers together as unused. Subsequent
+	 * management of this list is done by freelist.c.
+	 */
+	buf->freeNext = i + 1;
+#else
+	buf->freeNext = FREENEXT_NOT_IN_LIST;
+#endif
 }
 
 /*
@@ -331,12 +356,14 @@ StrategyInitialize(bool init)
 		 */
 		Assert(init);
 
+#ifdef USE_BUFMGR_FREELIST
 		/*
 		 * Grab the whole linked list of free buffers for our strategy. We
 		 * assume it was previously set up by InitBufferPool().
 		 */
 		StrategyControl->firstFreeBuffer = 0;
 		StrategyControl->lastFreeBuffer = NBuffers - 1;
+#endif
 
 		/* Initialize the clock sweep pointer */
 		StrategyControl->nextVictimBuffer = 0;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index e43719e..6a03d5d 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -190,6 +190,8 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern void StrategyInitFreelistBuffer(volatile BufferDesc *buf);
+
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);

simon@2ndQuadrant.com

about 14 years ago

In reply to: Simon Riggs (#16)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

v2 attached with cleanup of some random stuff that crept onto patch.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

removebufmgrfreelist.v2.patchtext/x-patch; charset=US-ASCII; name=removebufmgrfreelist.v2.patchDownload

diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index 94cefba..9332a74 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -115,7 +115,7 @@ InitBufferPool(void)
 			 * Initially link all the buffers together as unused. Subsequent
 			 * management of this list is done by freelist.c.
 			 */
-			buf->freeNext = i + 1;
+			StrategyInitFreelistBuffer(buf);
 
 			buf->io_in_progress_lock = LWLockAssign();
 			buf->content_lock = LWLockAssign();
diff --git a/src/backend/storage/buffer/freelist.c b/src/backend/storage/buffer/freelist.c
index 3e62448..6b49cae 100644
--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -27,6 +27,7 @@ typedef struct
 	/* Clock sweep hand: index of next buffer to consider grabbing */
 	int			nextVictimBuffer;
 
+#ifdef USE_BUFMGR_FREELIST
 	int			firstFreeBuffer;	/* Head of list of unused buffers */
 	int			lastFreeBuffer; /* Tail of list of unused buffers */
 
@@ -34,7 +35,7 @@ typedef struct
 	 * NOTE: lastFreeBuffer is undefined when firstFreeBuffer is -1 (that is,
 	 * when the list is empty)
 	 */
-
+#endif
 	/*
 	 * Statistics.	These counters should be wide enough that they can't
 	 * overflow during a single bgwriter cycle.
@@ -134,6 +135,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 	 */
 	StrategyControl->numBufferAllocs++;
 
+#ifdef USE_BUFMGR_FREELIST
 	/*
 	 * Try to get a buffer from the freelist.  Note that the freeNext fields
 	 * are considered to be protected by the BufFreelistLock not the
@@ -165,8 +167,9 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 		}
 		UnlockBufHdr(buf);
 	}
+#endif
 
-	/* Nothing on the freelist, so run the "clock sweep" algorithm */
+	/* Run the "clock sweep" algorithm */
 	trycounter = NBuffers;
 	for (;;)
 	{
@@ -223,6 +229,7 @@ StrategyGetBuffer(BufferAccessStrategy strategy, bool *lock_held)
 void
 StrategyFreeBuffer(volatile BufferDesc *buf)
 {
+#ifdef USE_BUFMGR_FREELIST
 	LWLockAcquire(BufFreelistLock, LW_EXCLUSIVE);
 
 	/*
@@ -238,6 +245,24 @@ StrategyFreeBuffer(volatile BufferDesc *buf)
 	}
 
 	LWLockRelease(BufFreelistLock);
+#endif
+}
+
+/*
+ * StrategyInitFreelist: put a buffer on the freelist during InitBufferPool
+ */
+void
+StrategyInitFreelistBuffer(volatile BufferDesc *buf)
+{
+#ifdef USE_BUFMGR_FREELIST
+	/*
+	 * Initially link all the buffers together as unused. Subsequent
+	 * management of this list is done by freelist.c.
+	 */
+	buf->freeNext = i + 1;
+#else
+	buf->freeNext = FREENEXT_NOT_IN_LIST;
+#endif
 }
 
 /*
@@ -331,12 +356,14 @@ StrategyInitialize(bool init)
 		 */
 		Assert(init);
 
+#ifdef USE_BUFMGR_FREELIST
 		/*
 		 * Grab the whole linked list of free buffers for our strategy. We
 		 * assume it was previously set up by InitBufferPool().
 		 */
 		StrategyControl->firstFreeBuffer = 0;
 		StrategyControl->lastFreeBuffer = NBuffers - 1;
+#endif
 
 		/* Initialize the clock sweep pointer */
 		StrategyControl->nextVictimBuffer = 0;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index e43719e..6a03d5d 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -190,6 +190,8 @@ extern bool StrategyRejectBuffer(BufferAccessStrategy strategy,
 extern int	StrategySyncStart(uint32 *complete_passes, uint32 *num_buf_alloc);
 extern Size StrategyShmemSize(void);
 extern void StrategyInitialize(bool init);
+extern void StrategyInitFreelistBuffer(volatile BufferDesc *buf);
+
 
 /* buf_table.c */
 extern Size BufTableShmemSize(int size);

robertmhaas@gmail.com

almost 14 years ago

In reply to: Simon Riggs (#8)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Sat, Dec 31, 2011 at 8:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Fri, Dec 30, 2011 at 10:20 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On ons, 2011-08-24 at 11:24 -0700, Daniel Farina wrote:

I was poking around at tablecmds and index.c and wonder if a similar
two-pass approach as used by CREATE INDEX CONCURRENTLY can be used to
create a DROP INDEX CONCURRENTLY, and if there would be any interest
in accepting such a patch.

Hmm, it seems I just independently came up with this same concept. My
problem is that if a CREATE INDEX CONCURRENTLY fails, you need an
exclusive lock on the table just to clean that up. If the table is
under constant load, you can't easily do that. So a two-pass DROP INDEX
CONCURRENTLY might have been helpful for me.

Here's a patch for this. Please review.

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

Maybe we need to change indisvalid to a "char" and make it three
valued: c = being created currently, v = valid, d = being dropped
concurrently, or something like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Peter Eisentraut

peter_e@gmx.net

almost 14 years ago

In reply to: Robert Haas (#18)

Re: Should I implement DROP INDEX CONCURRENTLY?

On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.

robertmhaas@gmail.com

almost 14 years ago

In reply to: Peter Eisentraut (#19)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.

Maybe we should set both to false?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Peter Eisentraut

peter_e@gmx.net

almost 14 years ago

In reply to: Robert Haas (#20)

Re: Should I implement DROP INDEX CONCURRENTLY?

On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:

On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.

Maybe we should set both to false?

Well, ready = false and valid = true doesn't make any sense. There is
only just-created -> ready -> valid. We might as well convert that to a
single "char" column, as you had indicated in your earlier email. But
that's independent of the proposed patch.

robertmhaas@gmail.com

almost 14 years ago

In reply to: Peter Eisentraut (#21)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:

On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.

Maybe we should set both to false?

Well, ready = false and valid = true doesn't make any sense. There is
only just-created -> ready -> valid. We might as well convert that to a
single "char" column, as you had indicated in your earlier email. But
that's independent of the proposed patch.

Sure, but the point is that I think if you want everyone to stop
touching the index, you ought to mark it both not-valid and not-ready,
which the current patch doesn't do.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

simon@2ndQuadrant.com

almost 14 years ago

In reply to: Robert Haas (#22)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Jan 18, 2012 at 1:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Jan 17, 2012 at 12:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On mån, 2012-01-16 at 14:46 -0500, Robert Haas wrote:

On Mon, Jan 16, 2012 at 2:06 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On mån, 2012-01-16 at 11:17 -0500, Robert Haas wrote:

I don't see how setting indisvalid to false helps with this, because
IIUC when a session sees indisvalid = false, it is supposed to avoid
using the index for queries but still make new index entries when a
write operation happens - but to drop an index, I think you'd need to
get into a state where no one was using the index for anything at all.

ISTM that one would need to set indisready to false instead.

Maybe we should set both to false?

Well, ready = false and valid = true doesn't make any sense. There is
only just-created -> ready -> valid. We might as well convert that to a
single "char" column, as you had indicated in your earlier email. But
that's independent of the proposed patch.

Sure, but the point is that I think if you want everyone to stop
touching the index, you ought to mark it both not-valid and not-ready,
which the current patch doesn't do.

Thanks for the review and the important observation. I agree that I've
changed the wrong column. indisready must be set false. Also agree
setting both false makes most sense.

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

robertmhaas@gmail.com

almost 14 years ago

In reply to: Simon Riggs (#23)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

No, I haven't done a full review yet - I just noticed that right at
the outset and wanted to be sure that got addressed. I can look it
over more carefully, and test it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

simon@2ndquadrant.com

almost 14 years ago

In reply to: Robert Haas (#24)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

No, I haven't done a full review yet - I just noticed that right at
the outset and wanted to be sure that got addressed. I can look it
over more carefully, and test it.

Corrected your point, and another found during retest.

Works as advertised, docs corrected.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

drop_index_concurrently.v2.patchtext/x-patch; charset=US-ASCII; name=drop_index_concurrently.v2.patchDownload

diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..aeb1531 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>
 </synopsis>
  </refsynopsisdiv>
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ..
    </varlistentry>
 
    <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      When this option is used, <productname>PostgreSQL</> will drop the
+      index without taking any locks that prevent concurrent selects, inserts,
+      updates, or deletes on the table; whereas a standard index drop
+      waits for a lock that locks out everything on the table until it's done.
+      Concurrent drop index is a two stage process. First, we mark the index
+      both invalid and not ready then commit the change. Next we wait until
+      there are no users locking the table who can see the index.
+     </para>
+     <para>
+      There are several caveats to be aware of when using this option.
+      Only one index name can be specified if the <literal>CONCURRENTLY</literal>
+      parameter is specified. Only one concurrent index drop can occur on a
+      table at a time and no modifications on the table are allowed meanwhile.
+      Regular <command>DROP INDEX</> command can be performed within
+      a transaction block, but <command>DROP INDEX CONCURRENTLY</> cannot.
+      There is no CASCADE option when dropping an index concurrently.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="PARAMETER">name</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 0b3d489..6884fdb 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -171,9 +171,10 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 					   DropBehavior behavior,
 					   int msglevel,
 					   const ObjectAddress *origObject);
-static void deleteOneObject(const ObjectAddress *object, Relation depRel);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void deleteOneObject(const ObjectAddress *object, Relation depRel,
+							int option_flags);
+static void doDeletion(const ObjectAddress *object, int option_flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int option_flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 							find_expr_references_context *context);
@@ -224,7 +225,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -254,7 +255,7 @@ performDeletion(const ObjectAddress *object,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -276,6 +277,13 @@ void
 performMultipleDeletions(const ObjectAddresses *objects,
 						 DropBehavior behavior)
 {
+	performMultipleDeletionsWithOptions(objects, behavior, 0);
+}
+
+void
+performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+						 DropBehavior behavior, int option_flags)
+{
 	Relation	depRel;
 	ObjectAddresses *targetObjects;
 	int			i;
@@ -308,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, option_flags);
 
 		findDependentObjects(thisobj,
 							 DEPFLAG_ORIGINAL,
@@ -336,13 +344,17 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	{
 		ObjectAddress *thisobj = targetObjects->refs + i;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, option_flags);
 	}
 
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((option_flags & OPT_CONCURRENTLY) != OPT_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -372,7 +384,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -407,7 +419,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 		if (thisextra->flags & DEPFLAG_ORIGINAL)
 			continue;
 
-		deleteOneObject(thisobj, depRel);
+		deleteOneObject(thisobj, depRel, 0);
 	}
 
 	/* And clean up */
@@ -596,7 +608,7 @@ findDependentObjects(const ObjectAddress *object,
 				 * deletion of the owning object.)
 				 */
 				ReleaseDeletionLock(object);
-				AcquireDeletionLock(&otherObject);
+				AcquireDeletionLock(&otherObject, 0);
 
 				/*
 				 * The owning object might have been deleted while we waited
@@ -691,7 +703,7 @@ findDependentObjects(const ObjectAddress *object,
 		/*
 		 * Must lock the dependent object before recursing to it.
 		 */
-		AcquireDeletionLock(&otherObject);
+		AcquireDeletionLock(&otherObject, 0);
 
 		/*
 		 * The dependent object might have been deleted while we waited to
@@ -950,7 +962,7 @@ reportDependentObjects(const ObjectAddresses *targetObjects,
  * depRel is the already-open pg_depend relation.
  */
 static void
-deleteOneObject(const ObjectAddress *object, Relation depRel)
+deleteOneObject(const ObjectAddress *object, Relation depRel, int option_flags)
 {
 	ScanKeyData key[3];
 	int			nkeys;
@@ -1001,9 +1013,16 @@ deleteOneObject(const ObjectAddress *object, Relation depRel)
 									 object->objectSubId);
 
 	/*
+	 * Close depRel if we are doing a drop concurrently because it
+	 * commits the transaction, so we don't want dangling references.
+	 */
+	if ((option_flags & OPT_CONCURRENTLY) == OPT_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
+
+	/*
 	 * Now delete the object itself, in an object-type-dependent way.
 	 */
-	doDeletion(object);
+	doDeletion(object, option_flags);
 
 	/*
 	 * Delete any comments or security labels associated with this object.
@@ -1028,7 +1047,7 @@ deleteOneObject(const ObjectAddress *object, Relation depRel)
  * doDeletion: actually delete a single object
  */
 static void
-doDeletion(const ObjectAddress *object)
+doDeletion(const ObjectAddress *object, int option_flags)
 {
 	switch (getObjectClass(object))
 	{
@@ -1038,8 +1057,11 @@ doDeletion(const ObjectAddress *object)
 
 				if (relKind == RELKIND_INDEX)
 				{
+					bool concurrent = ((option_flags & OPT_CONCURRENTLY)
+													== OPT_CONCURRENTLY);
+
 					Assert(object->objectSubId == 0);
-					index_drop(object->objectId);
+					index_drop(object->objectId, concurrent);
 				}
 				else
 				{
@@ -1175,10 +1197,15 @@ doDeletion(const ObjectAddress *object)
  * shared-across-databases object, so we have no need for LockSharedObject.
  */
 static void
-AcquireDeletionLock(const ObjectAddress *object)
+AcquireDeletionLock(const ObjectAddress *object, int option_flags)
 {
 	if (object->classId == RelationRelationId)
-		LockRelationOid(object->objectId, AccessExclusiveLock);
+	{
+		if ((option_flags & OPT_CONCURRENTLY) == OPT_CONCURRENTLY)
+			LockRelationOid(object->objectId, ShareUpdateExclusiveLock);
+		else
+			LockRelationOid(object->objectId, AccessExclusiveLock);
+	}
 	else
 		/* assume we should lock the whole object not a sub-object */
 		LockDatabaseObject(object->classId, object->objectId, 0,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bfbe642..5b7d5f0 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1282,7 +1282,7 @@ index_constraint_create(Relation heapRelation,
  * else associated dependencies won't be cleaned up.
  */
 void
-index_drop(Oid indexId)
+index_drop(Oid indexId, bool concurrent)
 {
 	Oid			heapId;
 	Relation	userHeapRelation;
@@ -1290,6 +1290,12 @@ index_drop(Oid indexId)
 	Relation	indexRelation;
 	HeapTuple	tuple;
 	bool		hasexprs;
+	LockRelId	heaprelid,
+				indexrelid;
+	LOCKTAG		heaplocktag,
+				indexlocktag;
+	VirtualTransactionId *old_lockholders;
+	Form_pg_index indexForm;
 
 	/*
 	 * To drop an index safely, we must grab exclusive lock on its parent
@@ -1302,17 +1308,118 @@ index_drop(Oid indexId)
 	 * that will make them update their index lists.
 	 */
 	heapId = IndexGetRelation(indexId, false);
-	userHeapRelation = heap_open(heapId, AccessExclusiveLock);
-
-	userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	if (concurrent)
+	{
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+	}
+	else
+	{
+		userHeapRelation = heap_open(heapId, AccessExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
 
 	/*
-	 * There can no longer be anyone *else* touching the index, but we might
-	 * still have open queries using it in our own session.
+	 * We might still have open queries using it in our own session.
 	 */
 	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
 
 	/*
+	 * Drop Index concurrently is similar in many ways to creating an
+	 * index concurrently, so some actions are similar to DefineIndex()
+	 */
+	if (concurrent)
+	{
+		/*
+		 * Mark index invalid by updating its pg_index entry
+		 *
+		 * Don't Assert(indexForm->indisvalid) because we may be trying to
+		 * clear up after an error when trying to create an index which left
+		 * the index invalid
+		 */
+		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
+
+		tuple = SearchSysCacheCopy1(INDEXRELID,
+										 ObjectIdGetDatum(indexId));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for index %u", indexId);
+		indexForm = (Form_pg_index) GETSTRUCT(tuple);
+
+		indexForm->indisvalid = false;	/* make unusable for queries */
+		indexForm->indisready = false;	/* make invisible to changes */
+
+		simple_heap_update(indexRelation, &tuple->t_self, tuple);
+		CatalogUpdateIndexes(indexRelation, tuple);
+
+		heap_close(indexRelation, RowExclusiveLock);
+
+		/* save lockrelid and locktag for below, then close but keep locks */
+		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
+		heap_close(userHeapRelation, NoLock);
+
+		indexrelid = userIndexRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(indexlocktag, indexrelid.dbId, indexrelid.relId);
+		index_close(userIndexRelation, NoLock);
+
+		/*
+		 * For a concurrent drop, it's important to make the catalog entries
+		 * visible to other transactions before we drop the index. The index
+		 * will be marked not indisvalid, so that no one else tries to either
+		 * insert into it or use it for queries.
+		 *
+		 * We must commit our current transaction so that the index update becomes
+		 * visible; then start another.  Note that all the data structures we just
+		 * built are lost in the commit.  The only data we keep past here are the
+		 * relation IDs.
+		 *
+		 * Before committing, get a session-level lock on the table, to ensure
+		 * that neither it nor the index can be dropped before we finish. This
+		 * cannot block, even if someone else is waiting for access, because we
+		 * already have the same lock within our transaction.
+		 */
+		LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		LockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+		StartTransactionCommand();
+
+		/*
+		 * Now we must wait until no running transaction could have the table open
+		 * with the old list of indexes.  To do this, inquire which xacts
+		 * currently would conflict with AccessExclusiveLock on the table -- ie,
+		 * which ones have a lock of any kind on the table.	Then wait for each of
+		 * these xacts to commit or abort.	Note we do not need to worry about
+		 * xacts that open the table for writing after this point; they will see
+		 * the index as invalid when they open the relation.
+		 *
+		 * Note: the reason we use actual lock acquisition here, rather than just
+		 * checking the ProcArray and sleeping, is that deadlock is possible if
+		 * one of the transactions in question is blocked trying to acquire an
+		 * exclusive lock on our table.  The lock code will detect deadlock and
+		 * error out properly.
+		 *
+		 * Note: GetLockConflicts() never reports our own xid, hence we need not
+		 * check for that.	Also, prepared xacts are not reported, which is fine
+		 * since they certainly aren't going to do anything more.
+		 */
+		old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
+
+		while (VirtualTransactionIdIsValid(*old_lockholders))
+		{
+			VirtualXactLock(*old_lockholders, true);
+			old_lockholders++;
+		}
+
+		/*
+		 * Re-open relations to allow us to complete our actions
+		 */
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+	}
+
+	/*
 	 * All predicate locks on the index are about to be made invalid. Promote
 	 * them to relation locks on the heap.
 	 */
@@ -1378,6 +1485,15 @@ index_drop(Oid indexId)
 	 * Close owning rel, but keep lock
 	 */
 	heap_close(userHeapRelation, NoLock);
+
+	/*
+	 * Release the session locks before we go.
+	 */
+	if (concurrent)
+	{
+		UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		UnlockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+	}
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index cc210f0..b142816 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -239,6 +239,7 @@ struct DropRelationCallbackState
 {
 	char	relkind;
 	Oid		heapOid;
+	bool	concurrent;
 };
 
 /* Alter table target-type flags for ATSimplePermissions */
@@ -734,6 +735,7 @@ RemoveRelations(DropStmt *drop)
 	ObjectAddresses *objects;
 	char		relkind;
 	ListCell   *cell;
+	int			option_flags = 0;
 
 	/*
 	 * First we identify all the relations, then we delete them in a single
@@ -796,7 +798,14 @@ RemoveRelations(DropStmt *drop)
 		/* Look up the appropriate relation using namespace search. */
 		state.relkind = relkind;
 		state.heapOid = InvalidOid;
-		relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
+		state.concurrent = drop->concurrent;
+		if (drop->concurrent)
+			relOid = RangeVarGetRelidExtended(rel, ShareUpdateExclusiveLock, true,
+										  false,
+										  RangeVarCallbackForDropRelation,
+										  (void *) &state);
+		else
+			relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
 										  false,
 										  RangeVarCallbackForDropRelation,
 										  (void *) &state);
@@ -816,7 +825,20 @@ RemoveRelations(DropStmt *drop)
 		add_exact_object_address(&obj, objects);
 	}
 
-	performMultipleDeletions(objects, drop->behavior);
+	/*
+	 * Set options and check further requirements for concurrent drop
+	 */
+	if (drop->concurrent)
+	{
+		/*
+		 * Confirm that concurrent behaviour is restricted in grammar.
+		 */
+		Assert(drop->removeType == OBJECT_INDEX);
+
+		option_flags |= OPT_CONCURRENTLY;
+	}
+
+	performMultipleDeletionsWithOptions(objects, drop->behavior, option_flags);
 
 	free_object_addresses(objects);
 }
@@ -885,7 +907,8 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	{
 		state->heapOid = IndexGetRelation(relOid, true);
 		if (OidIsValid(state->heapOid))
-			LockRelationOid(state->heapOid, AccessExclusiveLock);
+			LockRelationOid(state->heapOid,
+				(state->concurrent ? ShareUpdateExclusiveLock: AccessExclusiveLock));
 	}
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 71da0d8..6f19ec3 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2762,6 +2762,7 @@ _copyDropStmt(const DropStmt *from)
 	COPY_SCALAR_FIELD(removeType);
 	COPY_SCALAR_FIELD(behavior);
 	COPY_SCALAR_FIELD(missing_ok);
+	COPY_SCALAR_FIELD(concurrent);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index ba949db..1e6700e 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1188,6 +1188,7 @@ _equalDropStmt(const DropStmt *a, const DropStmt *b)
 	COMPARE_SCALAR_FIELD(removeType);
 	COMPARE_SCALAR_FIELD(behavior);
 	COMPARE_SCALAR_FIELD(missing_ok);
+	COMPARE_SCALAR_FIELD(concurrent);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 0ec039b..198cc6f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -213,7 +213,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 		CreatedbStmt DeclareCursorStmt DefineStmt DeleteStmt DiscardStmt DoStmt
 		DropGroupStmt DropOpClassStmt DropOpFamilyStmt DropPLangStmt DropStmt
 		DropAssertStmt DropTrigStmt DropRuleStmt DropCastStmt DropRoleStmt
-		DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt
+		DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt DropIndexStmt
 		DropForeignServerStmt DropUserMappingStmt ExplainStmt FetchStmt
 		GrantStmt GrantRoleStmt IndexStmt InsertStmt ListenStmt LoadStmt
 		LockStmt NotifyStmt ExplainableStmt PreparableStmt
@@ -307,7 +307,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 				opt_column_list columnList opt_name_list
 				sort_clause opt_sort_clause sortby_list index_params
 				name_list from_clause from_list opt_array_bounds
-				qualified_name_list any_name any_name_list
+				qualified_name_list any_name any_name_list any_name_single
 				any_operator expr_list attrs
 				target_list insert_column_list set_target_list
 				set_clause_list set_clause multiple_set_clause
@@ -741,6 +741,7 @@ stmt :
 			| DropFdwStmt
 			| DropForeignServerStmt
 			| DropGroupStmt
+			| DropIndexStmt
 			| DropOpClassStmt
 			| DropOpFamilyStmt
 			| DropOwnedStmt
@@ -4743,7 +4744,6 @@ DropStmt:	DROP drop_type IF_P EXISTS any_name_list opt_drop_behavior
 drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| SEQUENCE								{ $$ = OBJECT_SEQUENCE; }
 			| VIEW									{ $$ = OBJECT_VIEW; }
-			| INDEX									{ $$ = OBJECT_INDEX; }
 			| FOREIGN TABLE							{ $$ = OBJECT_FOREIGN_TABLE; }
 			| TYPE_P								{ $$ = OBJECT_TYPE; }
 			| DOMAIN_P								{ $$ = OBJECT_DOMAIN; }
@@ -4772,6 +4772,44 @@ attrs:		'.' attr_name
 					{ $$ = lappend($1, makeString($3)); }
 		;
 
+any_name_single:
+			any_name								{ $$ = list_make1($1); }
+		;
+
+DropIndexStmt:	DROP INDEX IF_P EXISTS any_name_list
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = TRUE;
+					n->objects = $5;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX any_name_list
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $3;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX CONCURRENTLY any_name_single
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $4;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = true;
+					$$ = (Node *)n;
+				}
+		;
 
 /*****************************************************************************
  *
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index de16a61..7549666 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -631,10 +631,15 @@ standard_ProcessUtility(Node *parsetree,
 		case T_DropStmt:
 			switch (((DropStmt *) parsetree)->removeType)
 			{
+				case OBJECT_INDEX:
+					if (((DropStmt *) parsetree)->concurrent)
+						PreventTransactionChain(isTopLevel,
+											"DROP INDEX CONCURRENTLY");
+					/* fall through */
+
 				case OBJECT_TABLE:
 				case OBJECT_SEQUENCE:
 				case OBJECT_VIEW:
-				case OBJECT_INDEX:
 				case OBJECT_FOREIGN_TABLE:
 					RemoveRelations((DropStmt *) parsetree);
 					break;
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 087341e..467e385 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -158,6 +158,11 @@ extern void performDeletion(const ObjectAddress *object,
 extern void performMultipleDeletions(const ObjectAddresses *objects,
 						 DropBehavior behavior);
 
+extern void performMultipleDeletionsWithOptions(const ObjectAddresses *objects,
+						 DropBehavior behavior, int option_flags);
+/* options specified for object deletion */
+#define OPT_CONCURRENTLY	0x01
+
 extern void deleteWhatDependsOn(const ObjectAddress *object,
 					bool showNotices);
 
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index c7f1dd2..3f73a6c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -63,7 +63,7 @@ extern void index_constraint_create(Relation heapRelation,
 						bool update_pgindex,
 						bool allow_system_table_mods);
 
-extern void index_drop(Oid indexId);
+extern void index_drop(Oid indexId, bool concurrent);
 
 extern IndexInfo *BuildIndexInfo(Relation index);
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index dce0e72..82c4397 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1906,6 +1906,7 @@ typedef struct DropStmt
 	ObjectType	removeType;		/* object type */
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 	bool		missing_ok;		/* skip error if object is missing? */
+	bool		concurrent;		/* drop index concurrently? */
 } DropStmt;
 
 /* ----------------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index b1fcada..b4d2b0d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2204,6 +2204,38 @@ Indexes:
     "concur_index5" btree (f2) WHERE f1 = 'x'::text
     "std_index" btree (f2)
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";
+-- failures
+DROP INDEX IF EXISTS CONCURRENTLY;
+ERROR:  syntax error at or near "CONCURRENTLY"
+LINE 1: DROP INDEX IF EXISTS CONCURRENTLY;
+                             ^
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+ERROR:  syntax error at or near ","
+LINE 1: DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+                                               ^
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ERROR:  DROP INDEX CONCURRENTLY cannot run inside a transaction block
+ROLLBACK;
+-- successes
+DROP INDEX CONCURRENTLY "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+\d concur_heap
+Table "public.concur_heap"
+ Column | Type | Modifiers 
+--------+------+-----------
+ f1     | text | 
+ f2     | text | 
+Indexes:
+    "std_index" btree (f2)
+
 DROP TABLE concur_heap;
 --
 -- Test ADD CONSTRAINT USING INDEX
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 5e5fc22..77c3613 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -695,6 +695,27 @@ COMMIT;
 
 \d concur_heap
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";
+
+-- failures
+DROP INDEX IF EXISTS CONCURRENTLY;
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ROLLBACK;
+
+-- successes
+DROP INDEX CONCURRENTLY "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+
+\d concur_heap
+
 DROP TABLE concur_heap;
 
 --

Heikki Linnakangas

heikki.linnakangas@enterprisedb.com

almost 14 years ago

In reply to: Simon Riggs (#16)

Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On 04.01.2012 13:14, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Jim Nasby<jim@nasby.net> writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

So, you're proposing that we remove freelist altogether? Sounds
reasonable, but that needs to be performance tested somehow. I'm not
sure what exactly the test should look like, but it probably should
involve an OLTP workload, and large tables being created, populated to
fill the cache with pages from the table, and dropped, while the OLTP
workload is running. I'd imagine that the worst case scenario looks
something like that.

Removing the freelist should speed up the drop, but the question is how
costly are the extra clock sweeps that the backends have to do.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

robertmhaas@gmail.com

almost 14 years ago

In reply to: Simon Riggs (#25)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

No, I haven't done a full review yet - I just noticed that right at
the outset and wanted to be sure that got addressed. I can look it
over more carefully, and test it.

Corrected your point, and another found during retest.

Works as advertised, docs corrected.

This comment needs updating:

/*
* To drop an index safely, we must grab exclusive lock on its parent
* table. Exclusive lock on the index alone is insufficient because
* another backend might be about to execute a query on the parent table.
* If it relies on a previously cached list of index OIDs, then it could
* attempt to access the just-dropped index. We must therefore take a
* table lock strong enough to prevent all queries on the table from
* proceeding until we commit and send out a shared-cache-inval notice
* that will make them update their index lists.
*/

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table? If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there. There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough. In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid. There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it. That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen. I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere. I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work. If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready. That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
[ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
[, ...] [ CASCADE | RESTRICT ]
CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict. After that you must write
CONCURRENTLY, followed by exactly one name. I think what you want is:

DROP INDEX [ IF EXISTS ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY. It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code. So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time. That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob
RESTRICT, which ought to be legal even if CASCADE isn't supported.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

simon@2ndQuadrant.com

almost 14 years ago

In reply to: Robert Haas (#27)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Sat, Jan 21, 2012 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jan 19, 2012 at 10:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Jan 18, 2012 at 11:12 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Jan 18, 2012 at 5:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

Can I just check with you that the only review comment is a one line
change? Seems better to make any additional review comments in one go.

No, I haven't done a full review yet - I just noticed that right at
the outset and wanted to be sure that got addressed. I can look it
over more carefully, and test it.

Corrected your point, and another found during retest.

Works as advertised, docs corrected.

This comment needs updating:

/*
* To drop an index safely, we must grab exclusive lock on its parent
* table. Exclusive lock on the index alone is insufficient because
* another backend might be about to execute a query on the parent table.
* If it relies on a previously cached list of index OIDs, then it could
* attempt to access the just-dropped index. We must therefore take a
* table lock strong enough to prevent all queries on the table from
* proceeding until we commit and send out a shared-cache-inval notice
* that will make them update their index lists.
*/

OK, I did reword some but clearly not enough.

Speaking of that comment, why does a concurrent index drop need any
lock at all on the parent table? If, as the current comment text
says, the point of locking the parent table is to make sure that no
new backends can create relcache entries until our transaction
commits, then it's not needed here, or at least not for that purpose.
We're going to out-wait anyone who has the index open *after* we've
committed our changes to the pg_index entry, so there shouldn't be a
race condition there. There may very well be a reason why we need a
lock on the parent, or there may not, but that's not it.

I feel happier locking the parent as well. I'd rather avoid strange
weirdness later as has happened before in this type of discussion.

On the flip side, it strikes me that there's considerable danger that
ShareUpdateExclusiveLock on the index might not be strong enough. In
particular, it would fall down if there's anyone who takes an
AccessShareLock, RowShareLock, or RowExclusiveLock and then proceeds
to access the index data despite its being marked !indisready and
!indisvalid. There are certainly instances of this in contrib, such
as in pgstatindex(), which I'm pretty sure will abort with a nastygram
if somebody truncates away the file under it. That might not be
enough reason to reject the patch, but I do think it would be the only
place in the system where we allow something like that to happen. I'm
not sure whether there are any similar risks in core - I am a bit
worried about get_actual_variable_range() and gincostestimate(), but I
can't tell for sure whether there is any actual problem there, or
elsewhere. I wonder if we should only do the *first* phase of the
drop with ShareUpdateExcusliveLock, and then after outwaiting
everyone, take an AccessExclusiveLock on the index (but not the table)
to do the rest of the work. If that doesn't allow the drop to proceed
concurrently, then we go find the places that block against it and
teach them not to lock/use/examine indexes that are !indisready. That
way, the worst case is that D-I-C is less concurrent than we'd like,
rather than making random other things fall over - specifically,
anything that count on our traditional guarantee that AccessShareLock
is enough to keep the file from disappearing.

Your caution is wise. All users of an index have already checked
whether the index is usable at plan time, so although there is much
code that assumes they can look at the index itself, that is not
executed until after the correct checks.

I'll look at VACUUM and other utilities, so thanks for that.

I don't see much point in having the higher level lock, except perhaps
as a test this code works.

In the department of minor nitpicks, the syntax synopsis you've added
looks wrong:

DROP INDEX
[ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable>
[, ...] [ CASCADE | RESTRICT ]
CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

That implies that you may write if exists, followed by 1 or more
names, followed by cascade or restrict. After that you must write
CONCURRENTLY, followed by exactly one name. I think what you want is:

DROP INDEX [ IF EXISTS ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
DROP INDEX CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>

...but I also wonder if we shouldn't support DROP INDEX [ IF EXISTS ]
CONCURRENTLY. It could also be very useful to be able to concurrently
drop multiple indexes at once, since that would require waiting out
all concurrent transactions only once instead of once per drop.
Either way, I think we should have only one syntax, and then reject
combinations we can't handle in the code. So I think we should have
the parser accept:

DROP INDEX [ IF EXISTS ] [ CONCURRENTLY ] <replaceable
class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]

And then, if you try to use cascade, or try to drop multiple indexes
at once, we should throw an error saying that those options aren't
supported in combination with CONCURRENTLY, rather than rejecting them
as a syntax error at parse time. That gives a better error message
and will make it easier for anyone who wants to extend this in the
future - plus it will allow things like DROP INDEX CONCURRENTLY bob
RESTRICT, which ought to be legal even if CASCADE isn't supported.

The syntax diagram is correct because there are restrictions there.

That's mostly because of problems trying to get the grammar clean and
conflict free, so that's the best I could do.

If you or anyone else know better how to do that, I'd appreciate some
insight there.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

jim@nasby.net

almost 14 years ago

In reply to: Heikki Linnakangas (#26)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:

On 04.01.2012 13:14, Simon Riggs wrote:

On Tue, Jan 3, 2012 at 11:28 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:

Jim Nasby<jim@nasby.net> writes:

On Jan 3, 2012, at 12:11 PM, Simon Riggs wrote:

This could well be related to the fact that DropRelFileNodeBuffers()
does a scan of shared_buffers, which is an O(N) approach no matter the
size of the index.

Couldn't we just leave the buffers alone? Once an index is dropped and that's pushed out through the catalog then nothing should be trying to access them and they'll eventually just get aged out.

No, we can't, because if they're still dirty then the bgwriter would
first try to write them to the no-longer-existing storage file. It's
important that we kill the buffers immediately during relation drop.

I'm still thinking that it might be sufficient to mark the buffers
invalid and let the clock sweep find them, thereby eliminating the need
for a freelist.

My patch puts things on the freelist only when it is free to do so.
Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

So, you're proposing that we remove freelist altogether? Sounds reasonable, but that needs to be performance tested somehow. I'm not sure what exactly the test should look like, but it probably should involve an OLTP workload, and large tables being created, populated to fill the cache with pages from the table, and dropped, while the OLTP workload is running. I'd imagine that the worst case scenario looks something like that.

We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...
--
Jim C. Nasby, Database Architect jim@nasby.net
512.569.9461 (cell) http://jim.nasby.net

robertmhaas@gmail.com

almost 14 years ago

In reply to: Jim Nasby (#29)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim@nasby.net> wrote:

We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well. The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test. The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!). So I guess
my vote would be to leave it alone pending further study, and maybe
remove it later if we can't find a way to rejigger it to be more
useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tgl@sss.pgh.pa.us

almost 14 years ago

In reply to: Robert Haas (#30)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim@nasby.net> wrote:

We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well. The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test. The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested. So I think
it would be advisable to prove rather than just assume that that's a
problem.

regards, tom lane

robertmhaas@gmail.com

almost 14 years ago

In reply to: Tom Lane (#31)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim@nasby.net> wrote:

We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well. The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test. The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested? Or for that matter,
that even an uncontested spinlock operation is cheap enough to do
while holding a badly contended LWLock?

So I think
it would be advisable to prove rather than just assume that that's a
problem.

It's pretty trivial to prove that there is a very serious problem with
BufFreelistLock. I'll admit I can't prove what the right fix is just
yet, and certainly measurement is warranted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

simon@2ndQuadrant.com

almost 14 years ago

In reply to: Robert Haas (#32)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Mon, Jan 23, 2012 at 1:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

It's pretty trivial to prove that there is a very serious problem with
BufFreelistLock. I'll admit I can't prove what the right fix is just
yet, and certainly measurement is warranted.

I agree there is a problem with BufFreelistLock (so please share your
results with Heikki, who seems not to).

As a result, I've published patches which reduce contention on that
lock in various ways, all of which seem valid to me.

There are lots of things we could have done for 9.2 but didn't, yet we
have some things that did get done on the table right now so it would
be useful to push those through immediately or at least defer
discussion on other things until we get back around to this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

tgl@sss.pgh.pa.us

almost 14 years ago

In reply to: Robert Haas (#32)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested?

Ah, never mind. I was thinking that we'd only be touching buffers that
were *on* the freelist, but of course this is incorrect. The real
problem there is that BufFreelistLock is also used to protect the
clock sweep pointer. I think basically we gotta find a way to allow
multiple backends to run clock sweeps concurrently. Or else fix
things so that the freelist never (well, hardly ever) runs dry.

regards, tom lane

robertmhaas@gmail.com

almost 14 years ago

In reply to: Tom Lane (#34)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Mon, Jan 23, 2012 at 11:01 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested?

Ah, never mind. I was thinking that we'd only be touching buffers that
were *on* the freelist, but of course this is incorrect. The real
problem there is that BufFreelistLock is also used to protect the
clock sweep pointer. I think basically we gotta find a way to allow
multiple backends to run clock sweeps concurrently. Or else fix
things so that the freelist never (well, hardly ever) runs dry.

I'd come to the same conclusion myself. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

jeff.janes@gmail.com

almost 14 years ago

In reply to: Jim Nasby (#29)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Sat, Jan 21, 2012 at 2:29 PM, Jim Nasby <jim@nasby.net> wrote:

On Jan 20, 2012, at 11:54 AM, Heikki Linnakangas wrote:

So, you're proposing that we remove freelist altogether? Sounds reasonable, but that needs to be performance tested somehow. I'm not sure what exactly the test should look like, but it probably should involve an OLTP workload, and large tables being created, populated to fill the cache with pages from the table, and dropped, while the OLTP workload is running. I'd imagine that the worst case scenario looks something like that.

We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

If the head and tail are both protected by BufFreelistLock, I'm pretty
sure this will make things worse, not better.

If we change to having head and tail protected by separate spinlocks,
then I don't see how you can remove the last buffer from the list, or
add a buffer to an empty list, without causing havoc.

Does anyone have ideas for implementing a cross-platform, lock-free,
FIFO linked list? Without that, I don't see how we are going to get
anywhere on this approach.

Cheers,

Jeff

jeff.janes@gmail.com

almost 14 years ago

In reply to: Robert Haas (#32)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Mon, Jan 23, 2012 at 5:06 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jan 23, 2012 at 12:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sat, Jan 21, 2012 at 5:29 PM, Jim Nasby <jim@nasby.net> wrote:

We should also look at having the freelist do something useful, instead of just dropping it completely. Unfortunately that's probably more work...

That's kinda my feeling as well. The free list in its current form is
pretty much useless, but I don't think we'll save much by getting rid
of it, because that's just a single test. The expensive part of what
we do while holding BufFreelistLock is, I think, iterating through
buffers taking and releasing a spinlock on each one (!).

Yeah ... spinlocks that, by definition, will be uncontested.

What makes you think that they are uncontested?

It is automatically partitioned over 131,072 spinlocks if
shared_buffers=1GB, so the chances of contention seem pretty low.
If a few very hot index root blocks are heavily contested, still that
would only be
encountered a few times out of the 131,072. I guess we could count
them, similar
to your LWLOCK_STATS changes.

Or for that matter,
that even an uncontested spinlock operation is cheap enough to do
while holding a badly contended LWLock?

Is the concern that getting a uncontested spinlock has lots of
instructions, or that the associated fences are expensive?

So I think
it would be advisable to prove rather than just assume that that's a
problem.

It's pretty trivial to prove that there is a very serious problem with
BufFreelistLock. I'll admit I can't prove what the right fix is just
yet, and certainly measurement is warranted.

From my own analysis and experiments, the mere act of getting the
BufFreelistLock when it is contended is far more important than
anything actually done while holding that lock. When the clock-sweep
is done often enough that the lock is contended, then it is done often
enough to keep the average usagecount low and so the average number of
buffers visited during a clock sweep before finding a usable one was
around 2.0. YMMV.

Can we get some people who run busy high-CPU servers that churn the
cache and think they may be getting hit with this problem post the
results of this query:

select usagecount, count(*) from pg_buffercache group by usagecount;

Cheers,

Jeff

simon@2ndQuadrant.com

almost 14 years ago

In reply to: Tom Lane (#34)

Re: Removing freelist (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Mon, Jan 23, 2012 at 4:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The real
problem there is that BufFreelistLock is also used to protect the
clock sweep pointer.

Agreed

I think basically we gotta find a way to allow
multiple backends to run clock sweeps concurrently. Or else fix
things so that the freelist never (well, hardly ever) runs dry.

Agreed.

The only question is what do we do now. I'm happy with the thought of
jam tomorrow, I'd like to see some minor improvements now, given that
when we say "we'll do that even better in the next release" it often
doesn't happen. Which is where those patches come in.

I've posted an improved lock wait analysis patch and have scheduled
some runs on heavily used systems to see what this tells us. Results
from live machines also greatly appreciated, since test systems seem
likely to be inappropriate tests. Patch copied here again.

This is a key issue since RAM is cheap enough now that people are
swamped by it, so large shared_buffers are very common.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

robertmhaas@gmail.com

almost 14 years ago

In reply to: Simon Riggs (#28)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Your caution is wise. All users of an index have already checked
whether the index is usable at plan time, so although there is much
code that assumes they can look at the index itself, that is not
executed until after the correct checks.

I'll look at VACUUM and other utilities, so thanks for that.

I don't see much point in having the higher level lock, except perhaps
as a test this code works.

I thought of another way this can cause a problem: suppose that while
we're dropping the relation with only ShareUpdateExclusiveLock, we get
as far as calling DropRelFileNodeBuffers. Just after we finish, some
other process that holds AccessShareLock or RowShareLock or
RowExclusiveLock reads and perhaps even dirties a page in the
relation. Now we've got pages in the buffer pool that might even be
dirty, but the backing file is truncated or perhaps removed
altogether. If the page is dirty, I think the background writer will
eventually choke trying to write out the page. If the page is not
dirty, I'm less certain whether anything is going to explode
violently, but it seems mighty sketchy to have pages in the buffer
pool with a buffer tag that don't exist any more. As the comment
says:

* It is the responsibility of higher-level code to ensure that the
* deletion or truncation does not lose any data that could be needed
* later. It is also the responsibility of higher-level code to ensure
* that no other process could be trying to load more pages of the
* relation into buffers.

...and of course, the intended mechanism is an AccessExclusiveLock.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

simon@2ndQuadrant.com

almost 14 years ago

In reply to: Robert Haas (#39)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Thu, Jan 26, 2012 at 2:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sat, Jan 21, 2012 at 10:11 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Your caution is wise. All users of an index have already checked
whether the index is usable at plan time, so although there is much
code that assumes they can look at the index itself, that is not
executed until after the correct checks.

I'll look at VACUUM and other utilities, so thanks for that.

I don't see much point in having the higher level lock, except perhaps
as a test this code works.

I thought of another way this can cause a problem: suppose that while
we're dropping the relation with only ShareUpdateExclusiveLock, we get
as far as calling DropRelFileNodeBuffers. Just after we finish, some
other process that holds AccessShareLock or RowShareLock or
RowExclusiveLock reads and perhaps even dirties a page in the
relation.

I can't see any way that situation can occur. The patch *explicitly*
waits until all people that can see the index as usable have dropped
their lock. So I don't think this is necessary. Having said that,
since we are talking about the index and not the whole table, if I
believe the above statement then I can't have any reasonable objection
to doing as you suggest.

Patch now locks index in AccessExclusiveLock in final stage of drop.

v3 attached.

If you have suggestions to improve grammar issues, they;re most
welcome. Otherwise this seems good to go.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

drop_index_concurrently.v3.patchtext/x-diff; charset=US-ASCII; name=drop_index_concurrently.v3.patchDownload

diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..aeb1531 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,9 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX
+	[ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+	CONCURRENTLY <replaceable class="PARAMETER">name</replaceable>
 </synopsis>
  </refsynopsisdiv>
 
@@ -50,6 +52,30 @@ DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ..
    </varlistentry>
 
    <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      When this option is used, <productname>PostgreSQL</> will drop the
+      index without taking any locks that prevent concurrent selects, inserts,
+      updates, or deletes on the table; whereas a standard index drop
+      waits for a lock that locks out everything on the table until it's done.
+      Concurrent drop index is a two stage process. First, we mark the index
+      both invalid and not ready then commit the change. Next we wait until
+      there are no users locking the table who can see the index.
+     </para>
+     <para>
+      There are several caveats to be aware of when using this option.
+      Only one index name can be specified if the <literal>CONCURRENTLY</literal>
+      parameter is specified. Only one concurrent index drop can occur on a
+      table at a time and no modifications on the table are allowed meanwhile.
+      Regular <command>DROP INDEX</> command can be performed within
+      a transaction block, but <command>DROP INDEX CONCURRENTLY</> cannot.
+      There is no CASCADE option when dropping an index concurrently.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="PARAMETER">name</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index db86262..58d8f5c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 					   const ObjectAddress *origObject);
 static void deleteOneObject(const ObjectAddress *object,
 							Relation depRel, int32 flags);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void doDeletion(const ObjectAddress *object, int flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 							find_expr_references_context *context);
@@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, flags);
 
 		findDependentObjects(thisobj,
 							 DEPFLAG_ORIGINAL,
@@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) != PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -380,7 +384,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -611,7 +615,7 @@ findDependentObjects(const ObjectAddress *object,
 				 * deletion of the owning object.)
 				 */
 				ReleaseDeletionLock(object);
-				AcquireDeletionLock(&otherObject);
+				AcquireDeletionLock(&otherObject, 0);
 
 				/*
 				 * The owning object might have been deleted while we waited
@@ -706,7 +710,7 @@ findDependentObjects(const ObjectAddress *object,
 		/*
 		 * Must lock the dependent object before recursing to it.
 		 */
-		AcquireDeletionLock(&otherObject);
+		AcquireDeletionLock(&otherObject, 0);
 
 		/*
 		 * The dependent object might have been deleted while we waited to
@@ -1016,9 +1020,16 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
 									 object->objectSubId);
 
 	/*
+	 * Close depRel if we are doing a drop concurrently because it
+	 * commits the transaction, so we don't want dangling references.
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) == PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
+
+	/*
 	 * Now delete the object itself, in an object-type-dependent way.
 	 */
-	doDeletion(object);
+	doDeletion(object, flags);
 
 	/*
 	 * Delete any comments or security labels associated with this object.
@@ -1043,7 +1054,7 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
  * doDeletion: actually delete a single object
  */
 static void
-doDeletion(const ObjectAddress *object)
+doDeletion(const ObjectAddress *object, int flags)
 {
 	switch (getObjectClass(object))
 	{
@@ -1053,8 +1064,11 @@ doDeletion(const ObjectAddress *object)
 
 				if (relKind == RELKIND_INDEX)
 				{
+					bool concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY)
+													== PERFORM_DELETION_CONCURRENTLY);
+
 					Assert(object->objectSubId == 0);
-					index_drop(object->objectId);
+					index_drop(object->objectId, concurrent);
 				}
 				else
 				{
@@ -1190,10 +1204,15 @@ doDeletion(const ObjectAddress *object)
  * shared-across-databases object, so we have no need for LockSharedObject.
  */
 static void
-AcquireDeletionLock(const ObjectAddress *object)
+AcquireDeletionLock(const ObjectAddress *object, int flags)
 {
 	if (object->classId == RelationRelationId)
-		LockRelationOid(object->objectId, AccessExclusiveLock);
+	{
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) == PERFORM_DELETION_CONCURRENTLY)
+			LockRelationOid(object->objectId, ShareUpdateExclusiveLock);
+		else
+			LockRelationOid(object->objectId, AccessExclusiveLock);
+	}
 	else
 		/* assume we should lock the whole object not a sub-object */
 		LockDatabaseObject(object->classId, object->objectId, 0,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bfbe642..c6708d3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1282,7 +1282,7 @@ index_constraint_create(Relation heapRelation,
  * else associated dependencies won't be cleaned up.
  */
 void
-index_drop(Oid indexId)
+index_drop(Oid indexId, bool concurrent)
 {
 	Oid			heapId;
 	Relation	userHeapRelation;
@@ -1290,6 +1290,12 @@ index_drop(Oid indexId)
 	Relation	indexRelation;
 	HeapTuple	tuple;
 	bool		hasexprs;
+	LockRelId	heaprelid,
+				indexrelid;
+	LOCKTAG		heaplocktag,
+				indexlocktag;
+	VirtualTransactionId *old_lockholders;
+	Form_pg_index indexForm;
 
 	/*
 	 * To drop an index safely, we must grab exclusive lock on its parent
@@ -1302,17 +1308,122 @@ index_drop(Oid indexId)
 	 * that will make them update their index lists.
 	 */
 	heapId = IndexGetRelation(indexId, false);
-	userHeapRelation = heap_open(heapId, AccessExclusiveLock);
-
-	userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	if (concurrent)
+	{
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+	}
+	else
+	{
+		userHeapRelation = heap_open(heapId, AccessExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
 
 	/*
-	 * There can no longer be anyone *else* touching the index, but we might
-	 * still have open queries using it in our own session.
+	 * We might still have open queries using it in our own session.
 	 */
 	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
 
 	/*
+	 * Drop Index concurrently is similar in many ways to creating an
+	 * index concurrently, so some actions are similar to DefineIndex()
+	 */
+	if (concurrent)
+	{
+		/*
+		 * Mark index invalid by updating its pg_index entry
+		 *
+		 * Don't Assert(indexForm->indisvalid) because we may be trying to
+		 * clear up after an error when trying to create an index which left
+		 * the index invalid
+		 */
+		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
+
+		tuple = SearchSysCacheCopy1(INDEXRELID,
+										 ObjectIdGetDatum(indexId));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for index %u", indexId);
+		indexForm = (Form_pg_index) GETSTRUCT(tuple);
+
+		indexForm->indisvalid = false;	/* make unusable for queries */
+		indexForm->indisready = false;	/* make invisible to changes */
+
+		simple_heap_update(indexRelation, &tuple->t_self, tuple);
+		CatalogUpdateIndexes(indexRelation, tuple);
+
+		heap_close(indexRelation, RowExclusiveLock);
+
+		/* save lockrelid and locktag for below, then close but keep locks */
+		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
+		heap_close(userHeapRelation, NoLock);
+
+		indexrelid = userIndexRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(indexlocktag, indexrelid.dbId, indexrelid.relId);
+		index_close(userIndexRelation, NoLock);
+
+		/*
+		 * For a concurrent drop, it's important to make the catalog entries
+		 * visible to other transactions before we drop the index. The index
+		 * will be marked not indisvalid, so that no one else tries to either
+		 * insert into it or use it for queries.
+		 *
+		 * We must commit our current transaction so that the index update becomes
+		 * visible; then start another.  Note that all the data structures we just
+		 * built are lost in the commit.  The only data we keep past here are the
+		 * relation IDs.
+		 *
+		 * Before committing, get a session-level lock on the table, to ensure
+		 * that neither it nor the index can be dropped before we finish. This
+		 * cannot block, even if someone else is waiting for access, because we
+		 * already have the same lock within our transaction.
+		 */
+		LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		LockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+		StartTransactionCommand();
+
+		/*
+		 * Now we must wait until no running transaction could have the table open
+		 * with the old list of indexes.  To do this, inquire which xacts
+		 * currently would conflict with AccessExclusiveLock on the table -- ie,
+		 * which ones have a lock of any kind on the table.	Then wait for each of
+		 * these xacts to commit or abort.	Note we do not need to worry about
+		 * xacts that open the table for writing after this point; they will see
+		 * the index as invalid when they open the relation.
+		 *
+		 * Note: the reason we use actual lock acquisition here, rather than just
+		 * checking the ProcArray and sleeping, is that deadlock is possible if
+		 * one of the transactions in question is blocked trying to acquire an
+		 * exclusive lock on our table.  The lock code will detect deadlock and
+		 * error out properly.
+		 *
+		 * Note: GetLockConflicts() never reports our own xid, hence we need not
+		 * check for that.	Also, prepared xacts are not reported, which is fine
+		 * since they certainly aren't going to do anything more.
+		 */
+		old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
+
+		while (VirtualTransactionIdIsValid(*old_lockholders))
+		{
+			VirtualXactLock(*old_lockholders, true);
+			old_lockholders++;
+		}
+
+		/*
+		 * Re-open relations to allow us to complete our actions.
+		 *
+		 * At this point, nothing should be accessing the index, but lets
+		 * leave nothing to chance and grab AccessExclusiveLock on the index
+		 * before the physical deletion.
+		 */
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
+
+	/*
 	 * All predicate locks on the index are about to be made invalid. Promote
 	 * them to relation locks on the heap.
 	 */
@@ -1378,6 +1489,15 @@ index_drop(Oid indexId)
 	 * Close owning rel, but keep lock
 	 */
 	heap_close(userHeapRelation, NoLock);
+
+	/*
+	 * Release the session locks before we go.
+	 */
+	if (concurrent)
+	{
+		UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		UnlockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+	}
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07dc326..6601c3a 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -239,6 +239,7 @@ struct DropRelationCallbackState
 {
 	char	relkind;
 	Oid		heapOid;
+	bool	concurrent;
 };
 
 /* Alter table target-type flags for ATSimplePermissions */
@@ -735,6 +736,7 @@ RemoveRelations(DropStmt *drop)
 	ObjectAddresses *objects;
 	char		relkind;
 	ListCell   *cell;
+	int			flags = 0;
 
 	/*
 	 * First we identify all the relations, then we delete them in a single
@@ -797,7 +799,14 @@ RemoveRelations(DropStmt *drop)
 		/* Look up the appropriate relation using namespace search. */
 		state.relkind = relkind;
 		state.heapOid = InvalidOid;
-		relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
+		state.concurrent = drop->concurrent;
+		if (drop->concurrent)
+			relOid = RangeVarGetRelidExtended(rel, ShareUpdateExclusiveLock, true,
+										  false,
+										  RangeVarCallbackForDropRelation,
+										  (void *) &state);
+		else
+			relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
 										  false,
 										  RangeVarCallbackForDropRelation,
 										  (void *) &state);
@@ -817,7 +826,20 @@ RemoveRelations(DropStmt *drop)
 		add_exact_object_address(&obj, objects);
 	}
 
-	performMultipleDeletions(objects, drop->behavior, 0);
+	/*
+	 * Set options and check further requirements for concurrent drop
+	 */
+	if (drop->concurrent)
+	{
+		/*
+		 * Confirm that concurrent behaviour is restricted in grammar.
+		 */
+		Assert(drop->removeType == OBJECT_INDEX);
+
+		flags |= PERFORM_DELETION_CONCURRENTLY;
+	}
+
+	performMultipleDeletions(objects, drop->behavior, flags);
 
 	free_object_addresses(objects);
 }
@@ -886,7 +908,8 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	{
 		state->heapOid = IndexGetRelation(relOid, true);
 		if (OidIsValid(state->heapOid))
-			LockRelationOid(state->heapOid, AccessExclusiveLock);
+			LockRelationOid(state->heapOid,
+				(state->concurrent ? ShareUpdateExclusiveLock: AccessExclusiveLock));
 	}
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index cc3168d..3c60dfb 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2763,6 +2763,7 @@ _copyDropStmt(const DropStmt *from)
 	COPY_SCALAR_FIELD(removeType);
 	COPY_SCALAR_FIELD(behavior);
 	COPY_SCALAR_FIELD(missing_ok);
+	COPY_SCALAR_FIELD(concurrent);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 2295195..597e2a7 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1189,6 +1189,7 @@ _equalDropStmt(const DropStmt *a, const DropStmt *b)
 	COMPARE_SCALAR_FIELD(removeType);
 	COMPARE_SCALAR_FIELD(behavior);
 	COMPARE_SCALAR_FIELD(missing_ok);
+	COMPARE_SCALAR_FIELD(concurrent);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 62fde67..9beabcd 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -213,7 +213,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 		CreatedbStmt DeclareCursorStmt DefineStmt DeleteStmt DiscardStmt DoStmt
 		DropGroupStmt DropOpClassStmt DropOpFamilyStmt DropPLangStmt DropStmt
 		DropAssertStmt DropTrigStmt DropRuleStmt DropCastStmt DropRoleStmt
-		DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt
+		DropUserStmt DropdbStmt DropTableSpaceStmt DropFdwStmt DropIndexStmt
 		DropForeignServerStmt DropUserMappingStmt ExplainStmt FetchStmt
 		GrantStmt GrantRoleStmt IndexStmt InsertStmt ListenStmt LoadStmt
 		LockStmt NotifyStmt ExplainableStmt PreparableStmt
@@ -307,7 +307,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 				opt_column_list columnList opt_name_list
 				sort_clause opt_sort_clause sortby_list index_params
 				name_list from_clause from_list opt_array_bounds
-				qualified_name_list any_name any_name_list
+				qualified_name_list any_name any_name_list any_name_single
 				any_operator expr_list attrs
 				target_list insert_column_list set_target_list
 				set_clause_list set_clause multiple_set_clause
@@ -741,6 +741,7 @@ stmt :
 			| DropFdwStmt
 			| DropForeignServerStmt
 			| DropGroupStmt
+			| DropIndexStmt
 			| DropOpClassStmt
 			| DropOpFamilyStmt
 			| DropOwnedStmt
@@ -4803,7 +4804,6 @@ DropStmt:	DROP drop_type IF_P EXISTS any_name_list opt_drop_behavior
 drop_type:	TABLE									{ $$ = OBJECT_TABLE; }
 			| SEQUENCE								{ $$ = OBJECT_SEQUENCE; }
 			| VIEW									{ $$ = OBJECT_VIEW; }
-			| INDEX									{ $$ = OBJECT_INDEX; }
 			| FOREIGN TABLE							{ $$ = OBJECT_FOREIGN_TABLE; }
 			| TYPE_P								{ $$ = OBJECT_TYPE; }
 			| DOMAIN_P								{ $$ = OBJECT_DOMAIN; }
@@ -4832,6 +4832,44 @@ attrs:		'.' attr_name
 					{ $$ = lappend($1, makeString($3)); }
 		;
 
+any_name_single:
+			any_name								{ $$ = list_make1($1); }
+		;
+
+DropIndexStmt:	DROP INDEX IF_P EXISTS any_name_list
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = TRUE;
+					n->objects = $5;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX any_name_list
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $3;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX CONCURRENTLY any_name_single
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $4;
+					n->arguments = NIL;
+					n->behavior = DROP_RESTRICT;
+					n->concurrent = true;
+					$$ = (Node *)n;
+				}
+		;
 
 /*****************************************************************************
  *
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5b81c0b..1e1d7a4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -631,10 +631,15 @@ standard_ProcessUtility(Node *parsetree,
 		case T_DropStmt:
 			switch (((DropStmt *) parsetree)->removeType)
 			{
+				case OBJECT_INDEX:
+					if (((DropStmt *) parsetree)->concurrent)
+						PreventTransactionChain(isTopLevel,
+											"DROP INDEX CONCURRENTLY");
+					/* fall through */
+
 				case OBJECT_TABLE:
 				case OBJECT_SEQUENCE:
 				case OBJECT_VIEW:
-				case OBJECT_INDEX:
 				case OBJECT_FOREIGN_TABLE:
 					RemoveRelations((DropStmt *) parsetree);
 					break;
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 28e68c5..f0eb564 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,6 +153,7 @@ typedef enum ObjectClass
 /* in dependency.c */
 
 #define PERFORM_DELETION_INTERNAL			0x0001
+#define PERFORM_DELETION_CONCURRENTLY		0x0002
 
 extern void performDeletion(const ObjectAddress *object,
 				DropBehavior behavior, int flags);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index c7f1dd2..3f73a6c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -63,7 +63,7 @@ extern void index_constraint_create(Relation heapRelation,
 						bool update_pgindex,
 						bool allow_system_table_mods);
 
-extern void index_drop(Oid indexId);
+extern void index_drop(Oid indexId, bool concurrent);
 
 extern IndexInfo *BuildIndexInfo(Relation index);
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 1d33ceb..1c09042 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1908,6 +1908,7 @@ typedef struct DropStmt
 	ObjectType	removeType;		/* object type */
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 	bool		missing_ok;		/* skip error if object is missing? */
+	bool		concurrent;		/* drop index concurrently? */
 } DropStmt;
 
 /* ----------------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index b1fcada..b4d2b0d 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2204,6 +2204,38 @@ Indexes:
     "concur_index5" btree (f2) WHERE f1 = 'x'::text
     "std_index" btree (f2)
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";
+-- failures
+DROP INDEX IF EXISTS CONCURRENTLY;
+ERROR:  syntax error at or near "CONCURRENTLY"
+LINE 1: DROP INDEX IF EXISTS CONCURRENTLY;
+                             ^
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+ERROR:  syntax error at or near ","
+LINE 1: DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+                                               ^
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ERROR:  DROP INDEX CONCURRENTLY cannot run inside a transaction block
+ROLLBACK;
+-- successes
+DROP INDEX CONCURRENTLY "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+\d concur_heap
+Table "public.concur_heap"
+ Column | Type | Modifiers 
+--------+------+-----------
+ f1     | text | 
+ f2     | text | 
+Indexes:
+    "std_index" btree (f2)
+
 DROP TABLE concur_heap;
 --
 -- Test ADD CONSTRAINT USING INDEX
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 5e5fc22..77c3613 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -695,6 +695,27 @@ COMMIT;
 
 \d concur_heap
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";
+
+-- failures
+DROP INDEX IF EXISTS CONCURRENTLY;
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ROLLBACK;
+
+-- successes
+DROP INDEX CONCURRENTLY "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+
+\d concur_heap
+
 DROP TABLE concur_heap;
 
 --

robertmhaas@gmail.com

almost 14 years ago

In reply to: Simon Riggs (#40)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Sun, Jan 29, 2012 at 5:01 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

I can't see any way that situation can occur. The patch *explicitly*
waits until all people that can see the index as usable have dropped
their lock. So I don't think this is necessary. Having said that,
since we are talking about the index and not the whole table, if I
believe the above statement then I can't have any reasonable objection
to doing as you suggest.

Patch now locks index in AccessExclusiveLock in final stage of drop.

v3 attached.

If you have suggestions to improve grammar issues, they;re most
welcome. Otherwise this seems good to go.

I improved the grammar issues in the attached version of the patch -
the syntax is now simpler and more consistent, IF EXISTS now works,
and RESTRICT is accepted (without changing the behavior) while CASCADE
fails with a nicer error message. I also fixed a bug in
RangeVarCallbackForDropRelation.

However, testing reveals that this doesn't really work. I found that
by playing around with a couple of concurrent sessions, I could get a
SELECT statement to hang waiting on the AccessExclusiveLock in the
second phase, because we're really still opening the relation (even
though it's invalid) and thus still locking it, and thus still waiting
for the AccessExclusiveLock. And I managed to get this:

rhaas=# select * from foo where a = 1;
ERROR: could not open relation with OID 16388

So then I tried changing the lock level back to
ShareUpdateExclusiveLock. It appears that that merely narrows the
window for errors of this type, rather than getting rid of them. I
ran this in one window:

pgbench -c 30 -j 30 -f f -n -T 180

where the file f contained this:

select * from foo where a = 1

and then in the other window I repeatedly did this:

rhaas=# create index foo_a on foo (a);
CREATE INDEX
rhaas=# drop index concurrently foo_a;
DROP INDEX

and pgbench started issuing messages like this:

Client 2 aborted in state 0: ERROR: could not open relation with OID 16394
Client 9 aborted in state 0: ERROR: could not open relation with OID 16397
Client 18 aborted in state 0: ERROR: could not open relation with OID 16397

My conclusion is that regardless of whether ShareUpdateExclusiveLock
or AccessExclusiveLock is used for the final phase of the drop, this
isn't safe. I haven't tracked down exactly where the wheels are
coming off, but the problem does not occur when using the non-current
form of DROP INDEX.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachments:

drop_index_concurrently.v4.patchapplication/octet-stream; name=drop_index_concurrently.v4.patchDownload

diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..343f7ac 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
 </synopsis>
  </refsynopsisdiv>
 
@@ -50,6 +50,29 @@ DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ..
    </varlistentry>
 
    <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      When this option is used, <productname>PostgreSQL</> will drop the
+      index without taking any locks that prevent concurrent selects, inserts,
+      updates, or deletes on the table; whereas a standard index drop
+      waits for a lock that locks out everything on the table until it's done.
+      Concurrent drop index is a two stage process. First, we mark the index
+      both invalid and not ready then commit the change. Next we wait until
+      there are no users locking the table who can see the index.
+     </para>
+     <para>
+      There are several caveats to be aware of when using this option.
+      Only one index name can be specified if the <literal>CONCURRENTLY</literal>
+      parameter is specified.  Regular <command>DROP INDEX</> command can be
+      performed within a transaction block, but
+      <command>DROP INDEX CONCURRENTLY</> cannot.
+      The CASCADE option is not supported when dropping an index concurrently.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="PARAMETER">name</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index db86262..58d8f5c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 					   const ObjectAddress *origObject);
 static void deleteOneObject(const ObjectAddress *object,
 							Relation depRel, int32 flags);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void doDeletion(const ObjectAddress *object, int flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 							find_expr_references_context *context);
@@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, flags);
 
 		findDependentObjects(thisobj,
 							 DEPFLAG_ORIGINAL,
@@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) != PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -380,7 +384,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -611,7 +615,7 @@ findDependentObjects(const ObjectAddress *object,
 				 * deletion of the owning object.)
 				 */
 				ReleaseDeletionLock(object);
-				AcquireDeletionLock(&otherObject);
+				AcquireDeletionLock(&otherObject, 0);
 
 				/*
 				 * The owning object might have been deleted while we waited
@@ -706,7 +710,7 @@ findDependentObjects(const ObjectAddress *object,
 		/*
 		 * Must lock the dependent object before recursing to it.
 		 */
-		AcquireDeletionLock(&otherObject);
+		AcquireDeletionLock(&otherObject, 0);
 
 		/*
 		 * The dependent object might have been deleted while we waited to
@@ -1016,9 +1020,16 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
 									 object->objectSubId);
 
 	/*
+	 * Close depRel if we are doing a drop concurrently because it
+	 * commits the transaction, so we don't want dangling references.
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) == PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
+
+	/*
 	 * Now delete the object itself, in an object-type-dependent way.
 	 */
-	doDeletion(object);
+	doDeletion(object, flags);
 
 	/*
 	 * Delete any comments or security labels associated with this object.
@@ -1043,7 +1054,7 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
  * doDeletion: actually delete a single object
  */
 static void
-doDeletion(const ObjectAddress *object)
+doDeletion(const ObjectAddress *object, int flags)
 {
 	switch (getObjectClass(object))
 	{
@@ -1053,8 +1064,11 @@ doDeletion(const ObjectAddress *object)
 
 				if (relKind == RELKIND_INDEX)
 				{
+					bool concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY)
+													== PERFORM_DELETION_CONCURRENTLY);
+
 					Assert(object->objectSubId == 0);
-					index_drop(object->objectId);
+					index_drop(object->objectId, concurrent);
 				}
 				else
 				{
@@ -1190,10 +1204,15 @@ doDeletion(const ObjectAddress *object)
  * shared-across-databases object, so we have no need for LockSharedObject.
  */
 static void
-AcquireDeletionLock(const ObjectAddress *object)
+AcquireDeletionLock(const ObjectAddress *object, int flags)
 {
 	if (object->classId == RelationRelationId)
-		LockRelationOid(object->objectId, AccessExclusiveLock);
+	{
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) == PERFORM_DELETION_CONCURRENTLY)
+			LockRelationOid(object->objectId, ShareUpdateExclusiveLock);
+		else
+			LockRelationOid(object->objectId, AccessExclusiveLock);
+	}
 	else
 		/* assume we should lock the whole object not a sub-object */
 		LockDatabaseObject(object->classId, object->objectId, 0,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bfbe642..c6708d3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1282,7 +1282,7 @@ index_constraint_create(Relation heapRelation,
  * else associated dependencies won't be cleaned up.
  */
 void
-index_drop(Oid indexId)
+index_drop(Oid indexId, bool concurrent)
 {
 	Oid			heapId;
 	Relation	userHeapRelation;
@@ -1290,6 +1290,12 @@ index_drop(Oid indexId)
 	Relation	indexRelation;
 	HeapTuple	tuple;
 	bool		hasexprs;
+	LockRelId	heaprelid,
+				indexrelid;
+	LOCKTAG		heaplocktag,
+				indexlocktag;
+	VirtualTransactionId *old_lockholders;
+	Form_pg_index indexForm;
 
 	/*
 	 * To drop an index safely, we must grab exclusive lock on its parent
@@ -1302,17 +1308,122 @@ index_drop(Oid indexId)
 	 * that will make them update their index lists.
 	 */
 	heapId = IndexGetRelation(indexId, false);
-	userHeapRelation = heap_open(heapId, AccessExclusiveLock);
-
-	userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	if (concurrent)
+	{
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+	}
+	else
+	{
+		userHeapRelation = heap_open(heapId, AccessExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
 
 	/*
-	 * There can no longer be anyone *else* touching the index, but we might
-	 * still have open queries using it in our own session.
+	 * We might still have open queries using it in our own session.
 	 */
 	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
 
 	/*
+	 * Drop Index concurrently is similar in many ways to creating an
+	 * index concurrently, so some actions are similar to DefineIndex()
+	 */
+	if (concurrent)
+	{
+		/*
+		 * Mark index invalid by updating its pg_index entry
+		 *
+		 * Don't Assert(indexForm->indisvalid) because we may be trying to
+		 * clear up after an error when trying to create an index which left
+		 * the index invalid
+		 */
+		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
+
+		tuple = SearchSysCacheCopy1(INDEXRELID,
+										 ObjectIdGetDatum(indexId));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for index %u", indexId);
+		indexForm = (Form_pg_index) GETSTRUCT(tuple);
+
+		indexForm->indisvalid = false;	/* make unusable for queries */
+		indexForm->indisready = false;	/* make invisible to changes */
+
+		simple_heap_update(indexRelation, &tuple->t_self, tuple);
+		CatalogUpdateIndexes(indexRelation, tuple);
+
+		heap_close(indexRelation, RowExclusiveLock);
+
+		/* save lockrelid and locktag for below, then close but keep locks */
+		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
+		heap_close(userHeapRelation, NoLock);
+
+		indexrelid = userIndexRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(indexlocktag, indexrelid.dbId, indexrelid.relId);
+		index_close(userIndexRelation, NoLock);
+
+		/*
+		 * For a concurrent drop, it's important to make the catalog entries
+		 * visible to other transactions before we drop the index. The index
+		 * will be marked not indisvalid, so that no one else tries to either
+		 * insert into it or use it for queries.
+		 *
+		 * We must commit our current transaction so that the index update becomes
+		 * visible; then start another.  Note that all the data structures we just
+		 * built are lost in the commit.  The only data we keep past here are the
+		 * relation IDs.
+		 *
+		 * Before committing, get a session-level lock on the table, to ensure
+		 * that neither it nor the index can be dropped before we finish. This
+		 * cannot block, even if someone else is waiting for access, because we
+		 * already have the same lock within our transaction.
+		 */
+		LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		LockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+		StartTransactionCommand();
+
+		/*
+		 * Now we must wait until no running transaction could have the table open
+		 * with the old list of indexes.  To do this, inquire which xacts
+		 * currently would conflict with AccessExclusiveLock on the table -- ie,
+		 * which ones have a lock of any kind on the table.	Then wait for each of
+		 * these xacts to commit or abort.	Note we do not need to worry about
+		 * xacts that open the table for writing after this point; they will see
+		 * the index as invalid when they open the relation.
+		 *
+		 * Note: the reason we use actual lock acquisition here, rather than just
+		 * checking the ProcArray and sleeping, is that deadlock is possible if
+		 * one of the transactions in question is blocked trying to acquire an
+		 * exclusive lock on our table.  The lock code will detect deadlock and
+		 * error out properly.
+		 *
+		 * Note: GetLockConflicts() never reports our own xid, hence we need not
+		 * check for that.	Also, prepared xacts are not reported, which is fine
+		 * since they certainly aren't going to do anything more.
+		 */
+		old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
+
+		while (VirtualTransactionIdIsValid(*old_lockholders))
+		{
+			VirtualXactLock(*old_lockholders, true);
+			old_lockholders++;
+		}
+
+		/*
+		 * Re-open relations to allow us to complete our actions.
+		 *
+		 * At this point, nothing should be accessing the index, but lets
+		 * leave nothing to chance and grab AccessExclusiveLock on the index
+		 * before the physical deletion.
+		 */
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
+
+	/*
 	 * All predicate locks on the index are about to be made invalid. Promote
 	 * them to relation locks on the heap.
 	 */
@@ -1378,6 +1489,15 @@ index_drop(Oid indexId)
 	 * Close owning rel, but keep lock
 	 */
 	heap_close(userHeapRelation, NoLock);
+
+	/*
+	 * Release the session locks before we go.
+	 */
+	if (concurrent)
+	{
+		UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		UnlockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+	}
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07dc326..751c9b7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -239,6 +239,7 @@ struct DropRelationCallbackState
 {
 	char	relkind;
 	Oid		heapOid;
+	bool	concurrent;
 };
 
 /* Alter table target-type flags for ATSimplePermissions */
@@ -735,6 +736,21 @@ RemoveRelations(DropStmt *drop)
 	ObjectAddresses *objects;
 	char		relkind;
 	ListCell   *cell;
+	int			flags = 0;
+	LOCKMODE	lockmode = AccessExclusiveLock;
+
+	if (drop->concurrent)
+	{
+		lockmode = ShareUpdateExclusiveLock;
+		if (list_length(drop->objects) > 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("DROP INDEX CONCURRENTLY does not support dropping multiple objects")));
+		if (drop->behavior == DROP_CASCADE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("DROP INDEX CONCURRENTLY does not support CASCADE")));
+	}
 
 	/*
 	 * First we identify all the relations, then we delete them in a single
@@ -797,7 +813,8 @@ RemoveRelations(DropStmt *drop)
 		/* Look up the appropriate relation using namespace search. */
 		state.relkind = relkind;
 		state.heapOid = InvalidOid;
-		relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
+		state.concurrent = drop->concurrent;
+		relOid = RangeVarGetRelidExtended(rel, lockmode, true,
 										  false,
 										  RangeVarCallbackForDropRelation,
 										  (void *) &state);
@@ -817,7 +834,20 @@ RemoveRelations(DropStmt *drop)
 		add_exact_object_address(&obj, objects);
 	}
 
-	performMultipleDeletions(objects, drop->behavior, 0);
+	/*
+	 * Set options and check further requirements for concurrent drop
+	 */
+	if (drop->concurrent)
+	{
+		/*
+		 * Confirm that concurrent behaviour is restricted in grammar.
+		 */
+		Assert(drop->removeType == OBJECT_INDEX);
+
+		flags |= PERFORM_DELETION_CONCURRENTLY;
+	}
+
+	performMultipleDeletions(objects, drop->behavior, flags);
 
 	free_object_addresses(objects);
 }
@@ -834,9 +864,12 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	struct DropRelationCallbackState *state;
 	char		relkind;
 	Form_pg_class classform;
+	LOCKMODE	heap_lockmode;
 
 	state = (struct DropRelationCallbackState *) arg;
 	relkind = state->relkind;
+	heap_lockmode = state->concurrent ?
+		ShareUpdateExclusiveLock : AccessExclusiveLock;
 
 	/*
 	 * If we previously locked some other index's heap, and the name we're
@@ -845,7 +878,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (relOid != oldRelOid && OidIsValid(state->heapOid))
 	{
-		UnlockRelationOid(state->heapOid, AccessExclusiveLock);
+		UnlockRelationOid(state->heapOid, heap_lockmode);
 		state->heapOid = InvalidOid;
 	}
 
@@ -886,7 +919,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	{
 		state->heapOid = IndexGetRelation(relOid, true);
 		if (OidIsValid(state->heapOid))
-			LockRelationOid(state->heapOid, AccessExclusiveLock);
+			LockRelationOid(state->heapOid, heap_lockmode);
 	}
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index cc3168d..3c60dfb 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2763,6 +2763,7 @@ _copyDropStmt(const DropStmt *from)
 	COPY_SCALAR_FIELD(removeType);
 	COPY_SCALAR_FIELD(behavior);
 	COPY_SCALAR_FIELD(missing_ok);
+	COPY_SCALAR_FIELD(concurrent);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 2295195..597e2a7 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1189,6 +1189,7 @@ _equalDropStmt(const DropStmt *a, const DropStmt *b)
 	COMPARE_SCALAR_FIELD(removeType);
 	COMPARE_SCALAR_FIELD(behavior);
 	COMPARE_SCALAR_FIELD(missing_ok);
+	COMPARE_SCALAR_FIELD(concurrent);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 62fde67..eab00d7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3271,6 +3271,7 @@ DropPLangStmt:
 					n->arguments = NIL;
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP opt_procedural LANGUAGE IF_P EXISTS ColId_or_Sconst opt_drop_behavior
@@ -3280,6 +3281,7 @@ DropPLangStmt:
 					n->objects = list_make1(list_make1(makeString($6)));
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -3675,6 +3677,7 @@ DropFdwStmt: DROP FOREIGN DATA_P WRAPPER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = false;
 					n->behavior = $6;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 				|  DROP FOREIGN DATA_P WRAPPER IF_P EXISTS name opt_drop_behavior
@@ -3685,6 +3688,7 @@ DropFdwStmt: DROP FOREIGN DATA_P WRAPPER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = true;
 					n->behavior = $8;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -3835,6 +3839,7 @@ DropForeignServerStmt: DROP SERVER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = false;
 					n->behavior = $4;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 				|  DROP SERVER IF_P EXISTS name opt_drop_behavior
@@ -3845,6 +3850,7 @@ DropForeignServerStmt: DROP SERVER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = true;
 					n->behavior = $6;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4232,6 +4238,7 @@ DropTrigStmt:
 					n->arguments = NIL;
 					n->behavior = $6;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP TRIGGER IF_P EXISTS name ON any_name opt_drop_behavior
@@ -4242,6 +4249,7 @@ DropTrigStmt:
 					n->arguments = NIL;
 					n->behavior = $8;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4702,6 +4710,7 @@ DropOpClassStmt:
 					n->removeType = OBJECT_OPCLASS;
 					n->behavior = $7;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP OPERATOR CLASS IF_P EXISTS any_name USING access_method opt_drop_behavior
@@ -4712,6 +4721,7 @@ DropOpClassStmt:
 					n->removeType = OBJECT_OPCLASS;
 					n->behavior = $9;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4725,6 +4735,7 @@ DropOpFamilyStmt:
 					n->removeType = OBJECT_OPFAMILY;
 					n->behavior = $7;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP OPERATOR FAMILY IF_P EXISTS any_name USING access_method opt_drop_behavior
@@ -4735,6 +4746,7 @@ DropOpFamilyStmt:
 					n->removeType = OBJECT_OPFAMILY;
 					n->behavior = $9;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4785,6 +4797,7 @@ DropStmt:	DROP drop_type IF_P EXISTS any_name_list opt_drop_behavior
 					n->objects = $5;
 					n->arguments = NIL;
 					n->behavior = $6;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP drop_type any_name_list opt_drop_behavior
@@ -4795,6 +4808,29 @@ DropStmt:	DROP drop_type IF_P EXISTS any_name_list opt_drop_behavior
 					n->objects = $3;
 					n->arguments = NIL;
 					n->behavior = $4;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX CONCURRENTLY any_name_list opt_drop_behavior
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $4;
+					n->arguments = NIL;
+					n->behavior = $5;
+					n->concurrent = true;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX CONCURRENTLY IF_P EXISTS any_name_list opt_drop_behavior
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $6;
+					n->arguments = NIL;
+					n->behavior = $7;
+					n->concurrent = true;
 					$$ = (Node *)n;
 				}
 		;
@@ -6233,6 +6269,7 @@ RemoveFuncStmt:
 					n->arguments = list_make1(extractArgTypes($4));
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP FUNCTION IF_P EXISTS func_name func_args opt_drop_behavior
@@ -6243,6 +6280,7 @@ RemoveFuncStmt:
 					n->arguments = list_make1(extractArgTypes($6));
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -6256,6 +6294,7 @@ RemoveAggrStmt:
 					n->arguments = list_make1($4);
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP AGGREGATE IF_P EXISTS func_name aggr_args opt_drop_behavior
@@ -6266,6 +6305,7 @@ RemoveAggrStmt:
 					n->arguments = list_make1($6);
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -6279,6 +6319,7 @@ RemoveOperStmt:
 					n->arguments = list_make1($4);
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP OPERATOR IF_P EXISTS any_operator oper_argtypes opt_drop_behavior
@@ -6289,6 +6330,7 @@ RemoveOperStmt:
 					n->arguments = list_make1($6);
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -6405,6 +6447,7 @@ DropCastStmt: DROP CAST opt_if_exists '(' Typename AS Typename ')' opt_drop_beha
 					n->arguments = list_make1(list_make1($7));
 					n->behavior = $9;
 					n->missing_ok = $3;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -7306,6 +7349,7 @@ DropRuleStmt:
 					n->arguments = NIL;
 					n->behavior = $6;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP RULE IF_P EXISTS name ON any_name opt_drop_behavior
@@ -7316,6 +7360,7 @@ DropRuleStmt:
 					n->arguments = NIL;
 					n->behavior = $8;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5b81c0b..1e1d7a4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -631,10 +631,15 @@ standard_ProcessUtility(Node *parsetree,
 		case T_DropStmt:
 			switch (((DropStmt *) parsetree)->removeType)
 			{
+				case OBJECT_INDEX:
+					if (((DropStmt *) parsetree)->concurrent)
+						PreventTransactionChain(isTopLevel,
+											"DROP INDEX CONCURRENTLY");
+					/* fall through */
+
 				case OBJECT_TABLE:
 				case OBJECT_SEQUENCE:
 				case OBJECT_VIEW:
-				case OBJECT_INDEX:
 				case OBJECT_FOREIGN_TABLE:
 					RemoveRelations((DropStmt *) parsetree);
 					break;
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 28e68c5..f0eb564 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,6 +153,7 @@ typedef enum ObjectClass
 /* in dependency.c */
 
 #define PERFORM_DELETION_INTERNAL			0x0001
+#define PERFORM_DELETION_CONCURRENTLY		0x0002
 
 extern void performDeletion(const ObjectAddress *object,
 				DropBehavior behavior, int flags);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index c7f1dd2..3f73a6c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -63,7 +63,7 @@ extern void index_constraint_create(Relation heapRelation,
 						bool update_pgindex,
 						bool allow_system_table_mods);
 
-extern void index_drop(Oid indexId);
+extern void index_drop(Oid indexId, bool concurrent);
 
 extern IndexInfo *BuildIndexInfo(Relation index);
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 1d33ceb..1c09042 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1908,6 +1908,7 @@ typedef struct DropStmt
 	ObjectType	removeType;		/* object type */
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 	bool		missing_ok;		/* skip error if object is missing? */
+	bool		concurrent;		/* drop index concurrently? */
 } DropStmt;
 
 /* ----------------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index b1fcada..d2381e1 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2204,6 +2204,34 @@ Indexes:
     "concur_index5" btree (f2) WHERE f1 = 'x'::text
     "std_index" btree (f2)
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";				-- works
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index2";		-- notice
+ERROR:  index "concur_index2" does not exist
+-- failures
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+ERROR:  DROP INDEX CONCURRENTLY does not support dropping multiple objects
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ERROR:  DROP INDEX CONCURRENTLY cannot run inside a transaction block
+ROLLBACK;
+-- successes
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+\d concur_heap
+Table "public.concur_heap"
+ Column | Type | Modifiers 
+--------+------+-----------
+ f1     | text | 
+ f2     | text | 
+Indexes:
+    "std_index" btree (f2)
+
 DROP TABLE concur_heap;
 --
 -- Test ADD CONSTRAINT USING INDEX
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 5e5fc22..2201598 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -695,6 +695,27 @@ COMMIT;
 
 \d concur_heap
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";				-- works
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index2";		-- notice
+
+-- failures
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ROLLBACK;
+
+-- successes
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+
+\d concur_heap
+
 DROP TABLE concur_heap;
 
 --

simon@2ndQuadrant.com

almost 14 years ago

In reply to: Robert Haas (#41)

1 attachment(s)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Feb 1, 2012 at 2:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:

I improved the grammar issues in the attached version of the patch -
the syntax is now simpler and more consistent, IF EXISTS now works,
and RESTRICT is accepted (without changing the behavior) while CASCADE
fails with a nicer error message. I also fixed a bug in
RangeVarCallbackForDropRelation.

Plus tests as well. Many thanks.

I fixed the main bug you observed and your test case now works
perfectly. I used pgbench to continuously drop/create an index, so a
little more than manual testing.

v5 Attached.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachments:

drop_index_concurrently.v5.patchtext/x-diff; charset=US-ASCII; name=drop_index_concurrently.v5.patchDownload

diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 7177ef2..343f7ac 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
+DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ...] [ CASCADE | RESTRICT ]
 </synopsis>
  </refsynopsisdiv>
 
@@ -50,6 +50,29 @@ DROP INDEX [ IF EXISTS ] <replaceable class="PARAMETER">name</replaceable> [, ..
    </varlistentry>
 
    <varlistentry>
+    <term><literal>CONCURRENTLY</literal></term>
+    <listitem>
+     <para>
+      When this option is used, <productname>PostgreSQL</> will drop the
+      index without taking any locks that prevent concurrent selects, inserts,
+      updates, or deletes on the table; whereas a standard index drop
+      waits for a lock that locks out everything on the table until it's done.
+      Concurrent drop index is a two stage process. First, we mark the index
+      both invalid and not ready then commit the change. Next we wait until
+      there are no users locking the table who can see the index.
+     </para>
+     <para>
+      There are several caveats to be aware of when using this option.
+      Only one index name can be specified if the <literal>CONCURRENTLY</literal>
+      parameter is specified.  Regular <command>DROP INDEX</> command can be
+      performed within a transaction block, but
+      <command>DROP INDEX CONCURRENTLY</> cannot.
+      The CASCADE option is not supported when dropping an index concurrently.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="PARAMETER">name</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index db86262..58d8f5c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -173,8 +173,8 @@ static void reportDependentObjects(const ObjectAddresses *targetObjects,
 					   const ObjectAddress *origObject);
 static void deleteOneObject(const ObjectAddress *object,
 							Relation depRel, int32 flags);
-static void doDeletion(const ObjectAddress *object);
-static void AcquireDeletionLock(const ObjectAddress *object);
+static void doDeletion(const ObjectAddress *object, int flags);
+static void AcquireDeletionLock(const ObjectAddress *object, int flags);
 static void ReleaseDeletionLock(const ObjectAddress *object);
 static bool find_expr_references_walker(Node *node,
 							find_expr_references_context *context);
@@ -232,7 +232,7 @@ performDeletion(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -316,7 +316,7 @@ performMultipleDeletions(const ObjectAddresses *objects,
 		 * Acquire deletion lock on each target object.  (Ideally the caller
 		 * has done this already, but many places are sloppy about it.)
 		 */
-		AcquireDeletionLock(thisobj);
+		AcquireDeletionLock(thisobj, flags);
 
 		findDependentObjects(thisobj,
 							 DEPFLAG_ORIGINAL,
@@ -350,7 +350,11 @@ performMultipleDeletions(const ObjectAddresses *objects,
 	/* And clean up */
 	free_object_addresses(targetObjects);
 
-	heap_close(depRel, RowExclusiveLock);
+	/*
+	 * We closed depRel earlier in deleteOneObject if doing a drop concurrently
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) != PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
 }
 
 /*
@@ -380,7 +384,7 @@ deleteWhatDependsOn(const ObjectAddress *object,
 	 * Acquire deletion lock on the target object.	(Ideally the caller has
 	 * done this already, but many places are sloppy about it.)
 	 */
-	AcquireDeletionLock(object);
+	AcquireDeletionLock(object, 0);
 
 	/*
 	 * Construct a list of objects to delete (ie, the given object plus
@@ -611,7 +615,7 @@ findDependentObjects(const ObjectAddress *object,
 				 * deletion of the owning object.)
 				 */
 				ReleaseDeletionLock(object);
-				AcquireDeletionLock(&otherObject);
+				AcquireDeletionLock(&otherObject, 0);
 
 				/*
 				 * The owning object might have been deleted while we waited
@@ -706,7 +710,7 @@ findDependentObjects(const ObjectAddress *object,
 		/*
 		 * Must lock the dependent object before recursing to it.
 		 */
-		AcquireDeletionLock(&otherObject);
+		AcquireDeletionLock(&otherObject, 0);
 
 		/*
 		 * The dependent object might have been deleted while we waited to
@@ -1016,9 +1020,16 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
 									 object->objectSubId);
 
 	/*
+	 * Close depRel if we are doing a drop concurrently because it
+	 * commits the transaction, so we don't want dangling references.
+	 */
+	if ((flags & PERFORM_DELETION_CONCURRENTLY) == PERFORM_DELETION_CONCURRENTLY)
+		heap_close(depRel, RowExclusiveLock);
+
+	/*
 	 * Now delete the object itself, in an object-type-dependent way.
 	 */
-	doDeletion(object);
+	doDeletion(object, flags);
 
 	/*
 	 * Delete any comments or security labels associated with this object.
@@ -1043,7 +1054,7 @@ deleteOneObject(const ObjectAddress *object, Relation depRel, int flags)
  * doDeletion: actually delete a single object
  */
 static void
-doDeletion(const ObjectAddress *object)
+doDeletion(const ObjectAddress *object, int flags)
 {
 	switch (getObjectClass(object))
 	{
@@ -1053,8 +1064,11 @@ doDeletion(const ObjectAddress *object)
 
 				if (relKind == RELKIND_INDEX)
 				{
+					bool concurrent = ((flags & PERFORM_DELETION_CONCURRENTLY)
+													== PERFORM_DELETION_CONCURRENTLY);
+
 					Assert(object->objectSubId == 0);
-					index_drop(object->objectId);
+					index_drop(object->objectId, concurrent);
 				}
 				else
 				{
@@ -1190,10 +1204,15 @@ doDeletion(const ObjectAddress *object)
  * shared-across-databases object, so we have no need for LockSharedObject.
  */
 static void
-AcquireDeletionLock(const ObjectAddress *object)
+AcquireDeletionLock(const ObjectAddress *object, int flags)
 {
 	if (object->classId == RelationRelationId)
-		LockRelationOid(object->objectId, AccessExclusiveLock);
+	{
+		if ((flags & PERFORM_DELETION_CONCURRENTLY) == PERFORM_DELETION_CONCURRENTLY)
+			LockRelationOid(object->objectId, ShareUpdateExclusiveLock);
+		else
+			LockRelationOid(object->objectId, AccessExclusiveLock);
+	}
 	else
 		/* assume we should lock the whole object not a sub-object */
 		LockDatabaseObject(object->classId, object->objectId, 0,
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bfbe642..5fae488 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1282,7 +1282,7 @@ index_constraint_create(Relation heapRelation,
  * else associated dependencies won't be cleaned up.
  */
 void
-index_drop(Oid indexId)
+index_drop(Oid indexId, bool concurrent)
 {
 	Oid			heapId;
 	Relation	userHeapRelation;
@@ -1290,6 +1290,12 @@ index_drop(Oid indexId)
 	Relation	indexRelation;
 	HeapTuple	tuple;
 	bool		hasexprs;
+	LockRelId	heaprelid,
+				indexrelid;
+	LOCKTAG		heaplocktag,
+				indexlocktag;
+	VirtualTransactionId *old_lockholders;
+	Form_pg_index indexForm;
 
 	/*
 	 * To drop an index safely, we must grab exclusive lock on its parent
@@ -1302,17 +1308,129 @@ index_drop(Oid indexId)
 	 * that will make them update their index lists.
 	 */
 	heapId = IndexGetRelation(indexId, false);
-	userHeapRelation = heap_open(heapId, AccessExclusiveLock);
-
-	userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	if (concurrent)
+	{
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, ShareUpdateExclusiveLock);
+	}
+	else
+	{
+		userHeapRelation = heap_open(heapId, AccessExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
 
 	/*
-	 * There can no longer be anyone *else* touching the index, but we might
-	 * still have open queries using it in our own session.
+	 * We might still have open queries using it in our own session.
 	 */
 	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
 
 	/*
+	 * Drop Index concurrently is similar in many ways to creating an
+	 * index concurrently, so some actions are similar to DefineIndex()
+	 */
+	if (concurrent)
+	{
+		/*
+		 * Mark index invalid by updating its pg_index entry
+		 *
+		 * Don't Assert(indexForm->indisvalid) because we may be trying to
+		 * clear up after an error when trying to create an index which left
+		 * the index invalid
+		 */
+		indexRelation = heap_open(IndexRelationId, RowExclusiveLock);
+
+		tuple = SearchSysCacheCopy1(INDEXRELID,
+										 ObjectIdGetDatum(indexId));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for index %u", indexId);
+		indexForm = (Form_pg_index) GETSTRUCT(tuple);
+
+		indexForm->indisvalid = false;	/* make unusable for queries */
+		indexForm->indisready = false;	/* make invisible to changes */
+
+		simple_heap_update(indexRelation, &tuple->t_self, tuple);
+		CatalogUpdateIndexes(indexRelation, tuple);
+
+		heap_close(indexRelation, RowExclusiveLock);
+
+		/*
+		 * Invalidate the relcache for the table, so that after this
+		 * transaction we will refresh the index list. Forgetting just the
+		 * index is not enough.
+		 */
+		CacheInvalidateRelcache(userHeapRelation);
+
+		/* save lockrelid and locktag for below, then close but keep locks */
+		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
+		heap_close(userHeapRelation, NoLock);
+
+		indexrelid = userIndexRelation->rd_lockInfo.lockRelId;
+		SET_LOCKTAG_RELATION(indexlocktag, indexrelid.dbId, indexrelid.relId);
+		index_close(userIndexRelation, NoLock);
+
+		/*
+		 * For a concurrent drop, it's important to make the catalog entries
+		 * visible to other transactions before we drop the index. The index
+		 * will be marked not indisvalid, so that no one else tries to either
+		 * insert into it or use it for queries.
+		 *
+		 * We must commit our current transaction so that the index update becomes
+		 * visible; then start another.  Note that all the data structures we just
+		 * built are lost in the commit.  The only data we keep past here are the
+		 * relation IDs.
+		 *
+		 * Before committing, get a session-level lock on the table, to ensure
+		 * that neither it nor the index can be dropped before we finish. This
+		 * cannot block, even if someone else is waiting for access, because we
+		 * already have the same lock within our transaction.
+		 */
+		LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		LockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+		StartTransactionCommand();
+
+		/*
+		 * Now we must wait until no running transaction could have the table open
+		 * with the old list of indexes.  To do this, inquire which xacts
+		 * currently would conflict with AccessExclusiveLock on the table -- ie,
+		 * which ones have a lock of any kind on the table.	Then wait for each of
+		 * these xacts to commit or abort.	Note we do not need to worry about
+		 * xacts that open the table for writing after this point; they will see
+		 * the index as invalid when they open the relation.
+		 *
+		 * Note: the reason we use actual lock acquisition here, rather than just
+		 * checking the ProcArray and sleeping, is that deadlock is possible if
+		 * one of the transactions in question is blocked trying to acquire an
+		 * exclusive lock on our table.  The lock code will detect deadlock and
+		 * error out properly.
+		 *
+		 * Note: GetLockConflicts() never reports our own xid, hence we need not
+		 * check for that.	Also, prepared xacts are not reported, which is fine
+		 * since they certainly aren't going to do anything more.
+		 */
+		old_lockholders = GetLockConflicts(&heaplocktag, AccessExclusiveLock);
+
+		while (VirtualTransactionIdIsValid(*old_lockholders))
+		{
+			VirtualXactLock(*old_lockholders, true);
+			old_lockholders++;
+		}
+
+		/*
+		 * Re-open relations to allow us to complete our actions.
+		 *
+		 * At this point, nothing should be accessing the index, but lets
+		 * leave nothing to chance and grab AccessExclusiveLock on the index
+		 * before the physical deletion.
+		 */
+		userHeapRelation = heap_open(heapId, ShareUpdateExclusiveLock);
+		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+	}
+
+	/*
 	 * All predicate locks on the index are about to be made invalid. Promote
 	 * them to relation locks on the heap.
 	 */
@@ -1378,6 +1496,15 @@ index_drop(Oid indexId)
 	 * Close owning rel, but keep lock
 	 */
 	heap_close(userHeapRelation, NoLock);
+
+	/*
+	 * Release the session locks before we go.
+	 */
+	if (concurrent)
+	{
+		UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
+		UnlockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+	}
 }
 
 /* ----------------------------------------------------------------
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 07dc326..751c9b7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -239,6 +239,7 @@ struct DropRelationCallbackState
 {
 	char	relkind;
 	Oid		heapOid;
+	bool	concurrent;
 };
 
 /* Alter table target-type flags for ATSimplePermissions */
@@ -735,6 +736,21 @@ RemoveRelations(DropStmt *drop)
 	ObjectAddresses *objects;
 	char		relkind;
 	ListCell   *cell;
+	int			flags = 0;
+	LOCKMODE	lockmode = AccessExclusiveLock;
+
+	if (drop->concurrent)
+	{
+		lockmode = ShareUpdateExclusiveLock;
+		if (list_length(drop->objects) > 1)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("DROP INDEX CONCURRENTLY does not support dropping multiple objects")));
+		if (drop->behavior == DROP_CASCADE)
+			ereport(ERROR,
+					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+					 errmsg("DROP INDEX CONCURRENTLY does not support CASCADE")));
+	}
 
 	/*
 	 * First we identify all the relations, then we delete them in a single
@@ -797,7 +813,8 @@ RemoveRelations(DropStmt *drop)
 		/* Look up the appropriate relation using namespace search. */
 		state.relkind = relkind;
 		state.heapOid = InvalidOid;
-		relOid = RangeVarGetRelidExtended(rel, AccessExclusiveLock, true,
+		state.concurrent = drop->concurrent;
+		relOid = RangeVarGetRelidExtended(rel, lockmode, true,
 										  false,
 										  RangeVarCallbackForDropRelation,
 										  (void *) &state);
@@ -817,7 +834,20 @@ RemoveRelations(DropStmt *drop)
 		add_exact_object_address(&obj, objects);
 	}
 
-	performMultipleDeletions(objects, drop->behavior, 0);
+	/*
+	 * Set options and check further requirements for concurrent drop
+	 */
+	if (drop->concurrent)
+	{
+		/*
+		 * Confirm that concurrent behaviour is restricted in grammar.
+		 */
+		Assert(drop->removeType == OBJECT_INDEX);
+
+		flags |= PERFORM_DELETION_CONCURRENTLY;
+	}
+
+	performMultipleDeletions(objects, drop->behavior, flags);
 
 	free_object_addresses(objects);
 }
@@ -834,9 +864,12 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	struct DropRelationCallbackState *state;
 	char		relkind;
 	Form_pg_class classform;
+	LOCKMODE	heap_lockmode;
 
 	state = (struct DropRelationCallbackState *) arg;
 	relkind = state->relkind;
+	heap_lockmode = state->concurrent ?
+		ShareUpdateExclusiveLock : AccessExclusiveLock;
 
 	/*
 	 * If we previously locked some other index's heap, and the name we're
@@ -845,7 +878,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	 */
 	if (relOid != oldRelOid && OidIsValid(state->heapOid))
 	{
-		UnlockRelationOid(state->heapOid, AccessExclusiveLock);
+		UnlockRelationOid(state->heapOid, heap_lockmode);
 		state->heapOid = InvalidOid;
 	}
 
@@ -886,7 +919,7 @@ RangeVarCallbackForDropRelation(const RangeVar *rel, Oid relOid, Oid oldRelOid,
 	{
 		state->heapOid = IndexGetRelation(relOid, true);
 		if (OidIsValid(state->heapOid))
-			LockRelationOid(state->heapOid, AccessExclusiveLock);
+			LockRelationOid(state->heapOid, heap_lockmode);
 	}
 }
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index cc3168d..3c60dfb 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2763,6 +2763,7 @@ _copyDropStmt(const DropStmt *from)
 	COPY_SCALAR_FIELD(removeType);
 	COPY_SCALAR_FIELD(behavior);
 	COPY_SCALAR_FIELD(missing_ok);
+	COPY_SCALAR_FIELD(concurrent);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 2295195..597e2a7 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1189,6 +1189,7 @@ _equalDropStmt(const DropStmt *a, const DropStmt *b)
 	COMPARE_SCALAR_FIELD(removeType);
 	COMPARE_SCALAR_FIELD(behavior);
 	COMPARE_SCALAR_FIELD(missing_ok);
+	COMPARE_SCALAR_FIELD(concurrent);
 
 	return true;
 }
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 62fde67..eab00d7 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -3271,6 +3271,7 @@ DropPLangStmt:
 					n->arguments = NIL;
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP opt_procedural LANGUAGE IF_P EXISTS ColId_or_Sconst opt_drop_behavior
@@ -3280,6 +3281,7 @@ DropPLangStmt:
 					n->objects = list_make1(list_make1(makeString($6)));
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -3675,6 +3677,7 @@ DropFdwStmt: DROP FOREIGN DATA_P WRAPPER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = false;
 					n->behavior = $6;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 				|  DROP FOREIGN DATA_P WRAPPER IF_P EXISTS name opt_drop_behavior
@@ -3685,6 +3688,7 @@ DropFdwStmt: DROP FOREIGN DATA_P WRAPPER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = true;
 					n->behavior = $8;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -3835,6 +3839,7 @@ DropForeignServerStmt: DROP SERVER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = false;
 					n->behavior = $4;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 				|  DROP SERVER IF_P EXISTS name opt_drop_behavior
@@ -3845,6 +3850,7 @@ DropForeignServerStmt: DROP SERVER name opt_drop_behavior
 					n->arguments = NIL;
 					n->missing_ok = true;
 					n->behavior = $6;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4232,6 +4238,7 @@ DropTrigStmt:
 					n->arguments = NIL;
 					n->behavior = $6;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP TRIGGER IF_P EXISTS name ON any_name opt_drop_behavior
@@ -4242,6 +4249,7 @@ DropTrigStmt:
 					n->arguments = NIL;
 					n->behavior = $8;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4702,6 +4710,7 @@ DropOpClassStmt:
 					n->removeType = OBJECT_OPCLASS;
 					n->behavior = $7;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP OPERATOR CLASS IF_P EXISTS any_name USING access_method opt_drop_behavior
@@ -4712,6 +4721,7 @@ DropOpClassStmt:
 					n->removeType = OBJECT_OPCLASS;
 					n->behavior = $9;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4725,6 +4735,7 @@ DropOpFamilyStmt:
 					n->removeType = OBJECT_OPFAMILY;
 					n->behavior = $7;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP OPERATOR FAMILY IF_P EXISTS any_name USING access_method opt_drop_behavior
@@ -4735,6 +4746,7 @@ DropOpFamilyStmt:
 					n->removeType = OBJECT_OPFAMILY;
 					n->behavior = $9;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
@@ -4785,6 +4797,7 @@ DropStmt:	DROP drop_type IF_P EXISTS any_name_list opt_drop_behavior
 					n->objects = $5;
 					n->arguments = NIL;
 					n->behavior = $6;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP drop_type any_name_list opt_drop_behavior
@@ -4795,6 +4808,29 @@ DropStmt:	DROP drop_type IF_P EXISTS any_name_list opt_drop_behavior
 					n->objects = $3;
 					n->arguments = NIL;
 					n->behavior = $4;
+					n->concurrent = false;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX CONCURRENTLY any_name_list opt_drop_behavior
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $4;
+					n->arguments = NIL;
+					n->behavior = $5;
+					n->concurrent = true;
+					$$ = (Node *)n;
+				}
+			| DROP INDEX CONCURRENTLY IF_P EXISTS any_name_list opt_drop_behavior
+				{
+					DropStmt *n = makeNode(DropStmt);
+					n->removeType = OBJECT_INDEX;
+					n->missing_ok = FALSE;
+					n->objects = $6;
+					n->arguments = NIL;
+					n->behavior = $7;
+					n->concurrent = true;
 					$$ = (Node *)n;
 				}
 		;
@@ -6233,6 +6269,7 @@ RemoveFuncStmt:
 					n->arguments = list_make1(extractArgTypes($4));
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP FUNCTION IF_P EXISTS func_name func_args opt_drop_behavior
@@ -6243,6 +6280,7 @@ RemoveFuncStmt:
 					n->arguments = list_make1(extractArgTypes($6));
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -6256,6 +6294,7 @@ RemoveAggrStmt:
 					n->arguments = list_make1($4);
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP AGGREGATE IF_P EXISTS func_name aggr_args opt_drop_behavior
@@ -6266,6 +6305,7 @@ RemoveAggrStmt:
 					n->arguments = list_make1($6);
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -6279,6 +6319,7 @@ RemoveOperStmt:
 					n->arguments = list_make1($4);
 					n->behavior = $5;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 			| DROP OPERATOR IF_P EXISTS any_operator oper_argtypes opt_drop_behavior
@@ -6289,6 +6330,7 @@ RemoveOperStmt:
 					n->arguments = list_make1($6);
 					n->behavior = $7;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -6405,6 +6447,7 @@ DropCastStmt: DROP CAST opt_if_exists '(' Typename AS Typename ')' opt_drop_beha
 					n->arguments = list_make1(list_make1($7));
 					n->behavior = $9;
 					n->missing_ok = $3;
+					n->concurrent = false;
 					$$ = (Node *)n;
 				}
 		;
@@ -7306,6 +7349,7 @@ DropRuleStmt:
 					n->arguments = NIL;
 					n->behavior = $6;
 					n->missing_ok = false;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 			| DROP RULE IF_P EXISTS name ON any_name opt_drop_behavior
@@ -7316,6 +7360,7 @@ DropRuleStmt:
 					n->arguments = NIL;
 					n->behavior = $8;
 					n->missing_ok = true;
+					n->concurrent = false;
 					$$ = (Node *) n;
 				}
 		;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 5b81c0b..1e1d7a4 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -631,10 +631,15 @@ standard_ProcessUtility(Node *parsetree,
 		case T_DropStmt:
 			switch (((DropStmt *) parsetree)->removeType)
 			{
+				case OBJECT_INDEX:
+					if (((DropStmt *) parsetree)->concurrent)
+						PreventTransactionChain(isTopLevel,
+											"DROP INDEX CONCURRENTLY");
+					/* fall through */
+
 				case OBJECT_TABLE:
 				case OBJECT_SEQUENCE:
 				case OBJECT_VIEW:
-				case OBJECT_INDEX:
 				case OBJECT_FOREIGN_TABLE:
 					RemoveRelations((DropStmt *) parsetree);
 					break;
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index a59950e..9cadb3f 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -3355,6 +3355,12 @@ RelationGetIndexList(Relation relation)
 		oidvector  *indclass;
 		bool		isnull;
 
+		/*
+		 * Ignore any indexes that are currently being dropped
+		 */
+		if (!index->indisvalid && !index->indisready)
+			continue;
+
 		/* Add index's OID to result list in the proper order */
 		result = insert_ordered_oid(result, index->indexrelid);
 
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index 28e68c5..f0eb564 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -153,6 +153,7 @@ typedef enum ObjectClass
 /* in dependency.c */
 
 #define PERFORM_DELETION_INTERNAL			0x0001
+#define PERFORM_DELETION_CONCURRENTLY		0x0002
 
 extern void performDeletion(const ObjectAddress *object,
 				DropBehavior behavior, int flags);
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index c7f1dd2..3f73a6c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -63,7 +63,7 @@ extern void index_constraint_create(Relation heapRelation,
 						bool update_pgindex,
 						bool allow_system_table_mods);
 
-extern void index_drop(Oid indexId);
+extern void index_drop(Oid indexId, bool concurrent);
 
 extern IndexInfo *BuildIndexInfo(Relation index);
 
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 1d33ceb..1c09042 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1908,6 +1908,7 @@ typedef struct DropStmt
 	ObjectType	removeType;		/* object type */
 	DropBehavior behavior;		/* RESTRICT or CASCADE behavior */
 	bool		missing_ok;		/* skip error if object is missing? */
+	bool		concurrent;		/* drop index concurrently? */
 } DropStmt;
 
 /* ----------------------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index b1fcada..d2381e1 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2204,6 +2204,34 @@ Indexes:
     "concur_index5" btree (f2) WHERE f1 = 'x'::text
     "std_index" btree (f2)
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";				-- works
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index2";		-- notice
+ERROR:  index "concur_index2" does not exist
+-- failures
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+ERROR:  DROP INDEX CONCURRENTLY does not support dropping multiple objects
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ERROR:  DROP INDEX CONCURRENTLY cannot run inside a transaction block
+ROLLBACK;
+-- successes
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+\d concur_heap
+Table "public.concur_heap"
+ Column | Type | Modifiers 
+--------+------+-----------
+ f1     | text | 
+ f2     | text | 
+Indexes:
+    "std_index" btree (f2)
+
 DROP TABLE concur_heap;
 --
 -- Test ADD CONSTRAINT USING INDEX
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 5e5fc22..2201598 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -695,6 +695,27 @@ COMMIT;
 
 \d concur_heap
 
+--
+-- Try some concurrent index drops
+--
+DROP INDEX CONCURRENTLY "concur_index2";				-- works
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index2";		-- notice
+
+-- failures
+DROP INDEX CONCURRENTLY "concur_index2", "concur_index3";
+BEGIN;
+DROP INDEX CONCURRENTLY "concur_index5";
+ROLLBACK;
+
+-- successes
+DROP INDEX CONCURRENTLY IF EXISTS "concur_index3";
+DROP INDEX CONCURRENTLY "concur_index4";
+DROP INDEX CONCURRENTLY "concur_index5";
+DROP INDEX CONCURRENTLY "concur_index1";
+DROP INDEX CONCURRENTLY "concur_heap_expr_idx";
+
+\d concur_heap
+
 DROP TABLE concur_heap;
 
 --

Peter Eisentraut

peter_e@gmx.net

almost 14 years ago

In reply to: Simon Riggs (#40)

Re: Should I implement DROP INDEX CONCURRENTLY?

On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:

Patch now locks index in AccessExclusiveLock in final stage of drop.

Doesn't that defeat the point of doing the CONCURRENTLY business in the
first place?

simon@2ndQuadrant.com

almost 14 years ago

In reply to: Peter Eisentraut (#43)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:

Patch now locks index in AccessExclusiveLock in final stage of drop.

Doesn't that defeat the point of doing the CONCURRENTLY business in the
first place?

That was my initial reaction.

We lock the index in AccessExclusiveLock only once we are certain
nobody else is looking at it any more.

So its a Kansas City Shuffle, with safe locking in case of people
doing strange low level things.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

robertmhaas@gmail.com

almost 14 years ago

In reply to: Simon Riggs (#44)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:

Patch now locks index in AccessExclusiveLock in final stage of drop.

Doesn't that defeat the point of doing the CONCURRENTLY business in the
first place?

That was my initial reaction.

We lock the index in AccessExclusiveLock only once we are certain
nobody else is looking at it any more.

So its a Kansas City Shuffle, with safe locking in case of people
doing strange low level things.

Yeah, I think this is much safer, and in this version that doesn't
seem to harm concurrency.

Given our previous experiences in this area, I wouldn't like to bet my
life savings on this having no remaining bugs - but if it does, I
can't find them.

I'll mark this "Ready for Committer".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

jeff.janes@gmail.com

almost 14 years ago

In reply to: Simon Riggs (#17)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Sun, Jan 8, 2012 at 8:19 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Jan 4, 2012 at 11:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote:

Not having a freelist at all is probably a simpler way of avoiding the
lock contention, so I'll happily back that suggestion instead. Patch
attached, previous patch revoked.

v2 attached with cleanup of some random stuff that crept onto patch.

Hi Simon,

Based on the way this patch leaves the old code behind (using #ifdef),
this looks more like a WIP patch which you want people to do
performance testing with, rather than patch proposed for committing.
If that is the case, could you outline the type of performance testing
where you think it would make a difference (and whether it should be
done on top of the main patch from this thread, the concurrent index
drop one)?

Also, it would be much easier to do the performance testing if this
behavior was controlled by a temporarily added GUC, rather than an
#ifdef. Do you think it is feasible to do that, or would the overhead
of a single "if (some_guc)" per StrategyGetBuffer and
StrategyFreeBuffer call distort things too much?

Cheers,

Jeff

robertmhaas@gmail.com

almost 14 years ago

In reply to: Robert Haas (#45)

Re: Should I implement DROP INDEX CONCURRENTLY?

On Fri, Feb 3, 2012 at 10:28 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 1, 2012 at 2:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On Wed, Feb 1, 2012 at 7:31 PM, Peter Eisentraut <peter_e@gmx.net> wrote:

On sön, 2012-01-29 at 22:01 +0000, Simon Riggs wrote:

Patch now locks index in AccessExclusiveLock in final stage of drop.

Doesn't that defeat the point of doing the CONCURRENTLY business in the
first place?

That was my initial reaction.

We lock the index in AccessExclusiveLock only once we are certain
nobody else is looking at it any more.

So its a Kansas City Shuffle, with safe locking in case of people
doing strange low level things.

Yeah, I think this is much safer, and in this version that doesn't
seem to harm concurrency.

Given our previous experiences in this area, I wouldn't like to bet my
life savings on this having no remaining bugs - but if it does, I
can't find them.

I'll mark this "Ready for Committer".

I don't think this has been committed - does that mean you've decided
against doing so, or just haven't had the round tuits?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

noah@leadboat.com

over 13 years ago

In reply to: Tom Lane (#4)

1 attachment(s)

unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Wed, Aug 24, 2011 at 03:38:09PM -0400, Tom Lane wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen).

Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be

I'd believe it. With a cold cache (sudo sysctl -w vm.drop_caches=3), my
desktop takes 2.6s to "rm" a 985 MiB file.

addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
reviewed the code that runs between the old and new call sites, and I did not
identify a hazard of moving it as you describe.

nm

Attachments:

unlink-after-lockrel-v1.patchtext/plain; charset=us-asciiDownload

diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 8f00186..8e4a455 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
***************
*** 1944,1957 **** CommitTransaction(void)
  	 */
  	AtEOXact_Inval(true);
  
- 	/*
- 	 * Likewise, dropping of files deleted during the transaction is best done
- 	 * after releasing relcache and buffer pins.  (This is not strictly
- 	 * necessary during commit, since such pins should have been released
- 	 * already, but this ordering is definitely critical during abort.)
- 	 */
- 	smgrDoPendingDeletes(true);
- 
  	AtEOXact_MultiXact();
  
  	ResourceOwnerRelease(TopTransactionResourceOwner,
--- 1944,1949 ----
***************
*** 1961,1966 **** CommitTransaction(void)
--- 1953,1969 ----
  						 RESOURCE_RELEASE_AFTER_LOCKS,
  						 true, true);
  
+ 	/*
+ 	 * Likewise, dropping of files deleted during the transaction is best done
+ 	 * after releasing relcache and buffer pins.  (This is not strictly
+ 	 * necessary during commit, since such pins should have been released
+ 	 * already, but this ordering is definitely critical during abort.)  Since
+ 	 * this may take many seconds, also delay until after releasing locks.
+ 	 * Other backends will observe the attendant catalog changes and not
+ 	 * attempt to access affected files.
+ 	 */
+ 	smgrDoPendingDeletes(true);
+ 
  	/* Check we've released all catcache entries */
  	AtEOXact_CatCache(true);
  
***************
*** 2354,2360 **** AbortTransaction(void)
  		AtEOXact_Buffers(false);
  		AtEOXact_RelationCache(false);
  		AtEOXact_Inval(false);
- 		smgrDoPendingDeletes(false);
  		AtEOXact_MultiXact();
  		ResourceOwnerRelease(TopTransactionResourceOwner,
  							 RESOURCE_RELEASE_LOCKS,
--- 2357,2362 ----
***************
*** 2362,2367 **** AbortTransaction(void)
--- 2364,2370 ----
  		ResourceOwnerRelease(TopTransactionResourceOwner,
  							 RESOURCE_RELEASE_AFTER_LOCKS,
  							 false, true);
+ 		smgrDoPendingDeletes(false);
  		AtEOXact_CatCache(false);
  
  		AtEOXact_GUC(false, 1);
***************
*** 4238,4250 **** AbortSubTransaction(void)
  		AtEOSubXact_RelationCache(false, s->subTransactionId,
  								  s->parent->subTransactionId);
  		AtEOSubXact_Inval(false);
- 		AtSubAbort_smgr();
  		ResourceOwnerRelease(s->curTransactionOwner,
  							 RESOURCE_RELEASE_LOCKS,
  							 false, false);
  		ResourceOwnerRelease(s->curTransactionOwner,
  							 RESOURCE_RELEASE_AFTER_LOCKS,
  							 false, false);
  
  		AtEOXact_GUC(false, s->gucNestLevel);
  		AtEOSubXact_SPI(false, s->subTransactionId);
--- 4241,4253 ----
  		AtEOSubXact_RelationCache(false, s->subTransactionId,
  								  s->parent->subTransactionId);
  		AtEOSubXact_Inval(false);
  		ResourceOwnerRelease(s->curTransactionOwner,
  							 RESOURCE_RELEASE_LOCKS,
  							 false, false);
  		ResourceOwnerRelease(s->curTransactionOwner,
  							 RESOURCE_RELEASE_AFTER_LOCKS,
  							 false, false);
+ 		AtSubAbort_smgr();
  
  		AtEOXact_GUC(false, s->gucNestLevel);
  		AtEOSubXact_SPI(false, s->subTransactionId);

robertmhaas@gmail.com

over 13 years ago

In reply to: Noah Misch (#48)

Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch <noah@leadboat.com> wrote:

On Wed, Aug 24, 2011 at 03:38:09PM -0400, Tom Lane wrote:

Merlin Moncure <mmoncure@gmail.com> writes:

On Wed, Aug 24, 2011 at 1:24 PM, Daniel Farina <daniel@heroku.com> wrote:

At Heroku we use CREATE INDEX CONCURRENTLY with great success, but
recently when frobbing around some indexes I realized that there is no
equivalent for DROP INDEX, and this is a similar but lesser problem
(as CREATE INDEX takes much longer), as DROP INDEX takes an ACCESS
EXCLUSIVE lock on the parent table while doing the work to unlink
files, which nominally one would think to be trivial, but I assure you
it is not at times for even indexes that are a handful of gigabytes
(let's say ~=< a dozen).

Are you sure that you are really waiting on the time to unlink the
file? there's other stuff going on in there like waiting for lock,
plan invalidation, etc. Point being, maybe the time consuming stuff
can't really be deferred which would make the proposal moot.

Assuming the issue really is the physical unlinks (which I agree I'd
like to see some evidence for), I wonder whether the problem could be

I'd believe it. With a cold cache (sudo sysctl -w vm.drop_caches=3), my
desktop takes 2.6s to "rm" a 985 MiB file.

addressed by moving smgrDoPendingDeletes() to after locks are released,
instead of before, in CommitTransaction/AbortTransaction. There does
not seem to be any strong reason why we have to do that before lock
release, since incoming potential users of a table should not be trying
to access the old physical storage after that anyway.

Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
reviewed the code that runs between the old and new call sites, and I did not
identify a hazard of moving it as you describe.

I looked at this when we last discussed it and didn't see a problem
either, so I tend to agree that we ought to go ahead and do this,
unless someone else sees a problem. Holding locks for even slightly
less time is a good idea, and if it turns out to be substantially less
time in some cases, then we win more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Robert Haas (#49)

Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch <noah@leadboat.com> wrote:

Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
reviewed the code that runs between the old and new call sites, and I did not
identify a hazard of moving it as you describe.

I looked at this when we last discussed it and didn't see a problem
either, so I tend to agree that we ought to go ahead and do this,

+1, as long as you mean in 9.3 not 9.2. I don't see any risk either,
but the time for taking new risks in 9.2 is past.

Noah, please add this patch to the upcoming CF, if you didn't already.

regards, tom lane

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#50)

Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Sun, Jun 10, 2012 at 5:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch <noah@leadboat.com> wrote:

Agreed. We now have $OLD_SUBJECT, but this is a win independently. I have
reviewed the code that runs between the old and new call sites, and I did not
identify a hazard of moving it as you describe.

I looked at this when we last discussed it and didn't see a problem
either, so I tend to agree that we ought to go ahead and do this,

+1, as long as you mean in 9.3 not 9.2. I don't see any risk either,
but the time for taking new risks in 9.2 is past.

Noah, please add this patch to the upcoming CF, if you didn't already.

I re-reviewed this and committed it.

Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it
just for extensions?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

tgl@sss.pgh.pa.us

over 13 years ago

In reply to: Robert Haas (#51)

Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

Robert Haas <robertmhaas@gmail.com> writes:

Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it
just for extensions?

I'm too lazy to go look, but it certainly ought to be in use.
The idea is that that's the phase for post-lock-release cleanup,
and anything that can possibly be postponed till after releasing
locks certainly should be ...

regards, tom lane

robertmhaas@gmail.com

over 13 years ago

In reply to: Tom Lane (#52)

Re: unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

On Thu, Jun 14, 2012 at 10:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything? Is it
just for extensions?

I'm too lazy to go look, but it certainly ought to be in use.
The idea is that that's the phase for post-lock-release cleanup,
and anything that can possibly be postponed till after releasing
locks certainly should be ...

Oh, you're right. I missed the logic in ResourceOwnerReleaseInternal.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company