DROP INDEX CONCURRENTLY on partitioned index

Started by Justin Pryzbyabout 5 years ago5 messages

pryzby@telsasoft.com

about 5 years ago

1 attachment(s)

Forking this thread, since the existing CFs have been closed.
/messages/by-id/20200914143102.GX18552@telsasoft.com

On Mon, Sep 14, 2020 at 09:31:03AM -0500, Justin Pryzby wrote:

On Sat, Sep 12, 2020 at 10:35:34AM +0900, Michael Paquier wrote:

On Fri, Sep 11, 2020 at 07:13:01PM -0500, Justin Pryzby wrote:

On Tue, Sep 08, 2020 at 01:31:05PM +0900, Michael Paquier wrote:

- CIC on partitioned relations. (Should we also care about DROP INDEX
CONCURRENTLY as well?)

Do you have any idea what you think that should look like for DROP INDEX
CONCURRENTLY ?

Making the maintenance of the partition tree consistent to the user is
the critical part here, so my guess on this matter is:
1) Remove each index from the partition tree and mark the indexes as
invalid in the same transaction. This makes sure that after commit no
indexes would get used for scans, and the partition dependency tree
pis completely removed with the parent table. That's close to what we
do in index_concurrently_swap() except that we just want to remove the
dependencies with the partitions, and not just swap them of course.
2) Switch each index to INDEX_DROP_SET_DEAD, one per transaction
should be fine as that prevents inserts.
3) Finish the index drop.

Step 2) and 3) could be completely done for each index as part of
index_drop(). The tricky part is to integrate 1) cleanly within the
existing dependency machinery while still knowing about the list of
partitions that can be removed. I think that this part is not that
straight-forward, but perhaps we could just make this logic part of
RemoveRelations() when listing all the objects to remove.

Thanks.

I see three implementation ideas..

1. I think your way has an issue that the dependencies are lost. If there's an
interruption, the user is maybe left with hundreds or thousands of detached
indexes to clean up. This is strange since there's actually no detach command
for indexes (but they're implicitly "attached" when a matching parent index is
created). A 2nd issue is that DETACH currently requires an exclusive lock (but
see Alvaro's WIP patch).

I think this is a critical problem with this approach. It's not ok if a
failure leaves behind N partition indexes not attached to any parent.
They may have pretty different names, which is a mess to clean up.

2. Maybe the easiest way is to mark all indexes invalid and then drop all
partitions (concurrently) and then the partitioned parent. If interrupted,
this would leave a parent index marked "invalid", and some child tables with no
indexes. I think this may be "ok". The same thing is possible if a concurrent
build is interrupted, right ?

I think adding the "invalid" mark in the simple/naive way isn't enough - it has
to do everything DROP INDEX CONCURRENTLY does (of course).

3. I have a patch which changes index_drop() to "expand" a partitioned index into
its list of children. Each of these becomes a List:
| indexId, heapId, userIndexRelation, userHeapRelation, heaplocktag, heaprelid, indexrelid
The same process is followed as for a single index, but handling all partitions
at once in two transactions total. Arguably, this is bad since that function
currently takes a single Oid but would now ends up operating on a list of indexes.

This is what's implemented in the attached. It's very possible I've missed
opportunities for better simplification/integration.

--
Justin

Attachments:

v1-0001-WIP-implement-DROP-INDEX-CONCURRENTLY-on-partitio.patchtext/x-diff; charset=us-asciiDownload

From 990db2a4016be3e92abe53f0600368258d2c8744 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 8 Sep 2020 12:12:44 -0500
Subject: [PATCH v1] WIP: implement DROP INDEX CONCURRENTLY on partitioned
 index

It's necessary to drop each of the partitions with the same concurrent logic as
for a normal index.  But, they each depend on the parent, partitioned index,
and the dependency can't be removed first, since DROP CONCURRENTLY must be the
first command in a txn, and it would leave a mess if it weren't transactional.

Also, it's desired if dropping N partitions concurrently requires only two
transaction waits, and not 2*N.

So drop the parent and all its children in a single loop, handling the entire
list of indexes at each stage.
---
 doc/src/sgml/ref/drop_index.sgml       |   2 -
 src/backend/catalog/dependency.c       |  85 +++++-
 src/backend/catalog/index.c            | 384 ++++++++++++++++---------
 src/backend/commands/tablecmds.c       |  17 +-
 src/include/catalog/dependency.h       |   6 +
 src/test/regress/expected/indexing.out |   4 +-
 src/test/regress/sql/indexing.sql      |   3 +-
 7 files changed, 331 insertions(+), 170 deletions(-)

diff --git a/doc/src/sgml/ref/drop_index.sgml b/doc/src/sgml/ref/drop_index.sgml
index 85cf23bca2..0aedd71bd6 100644
--- a/doc/src/sgml/ref/drop_index.sgml
+++ b/doc/src/sgml/ref/drop_index.sgml
@@ -57,8 +57,6 @@ DROP INDEX [ CONCURRENTLY ] [ IF EXISTS ] <replaceable class="parameter">name</r
       Also, regular <command>DROP INDEX</command> commands can be
       performed within a transaction block, but
       <command>DROP INDEX CONCURRENTLY</command> cannot.
-      Lastly, indexes on partitioned tables cannot be dropped using this
-      option.
      </para>
      <para>
       For temporary tables, <command>DROP INDEX</command> is always
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index f515e2c308..b3877299f5 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -358,6 +358,50 @@ performDeletion(const ObjectAddress *object,
 	table_close(depRel, RowExclusiveLock);
 }
 
+/*
+ * Concurrently drop a partitioned index.
+ */
+void
+performDeleteParentIndexConcurrent(const ObjectAddress *object,
+				DropBehavior behavior, int flags)
+{
+	Relation	depRel;
+	ObjectAddresses *targetObjects;
+	ObjectAddressExtra extra = {
+		.flags = DEPFLAG_AUTO,
+		.dependee = *object,
+	};
+
+	/*
+	 * Concurrently dropping partitioned index is special: it must avoid
+	 * calling AcquireDeletionLock
+	 */
+
+	Assert((flags & PERFORM_DELETION_CONCURRENTLY) != 0);
+	Assert(object->classId == RelationRelationId);
+	Assert(get_rel_relkind(object->objectId) == RELKIND_PARTITIONED_INDEX);
+
+	targetObjects = new_object_addresses();
+	add_exact_object_address_extra(object, &extra, targetObjects);
+
+	reportDependentObjects(targetObjects,
+						   behavior,
+						   flags,
+						   object);
+
+	/*
+	 * do the deed
+	 * note that index_drop specially deletes dependencies of child indexes
+	 */
+
+	depRel = table_open(DependRelationId, RowExclusiveLock);
+	deleteObjectsInList(targetObjects, &depRel, flags);
+	table_close(depRel, RowExclusiveLock);
+
+	/* And clean up */
+	free_object_addresses(targetObjects);
+}
+
 /*
  * performMultipleDeletions: Similar to performDeletion, but act on multiple
  * objects at once.
@@ -1287,11 +1331,6 @@ DropObjectById(const ObjectAddress *object)
 static void
 deleteOneObject(const ObjectAddress *object, Relation *depRel, int flags)
 {
-	ScanKeyData key[3];
-	int			nkeys;
-	SysScanDesc scan;
-	HeapTuple	tup;
-
 	/* DROP hook of the objects being removed */
 	InvokeObjectDropHookArg(object->classId, object->objectId,
 							object->objectSubId, flags);
@@ -1321,6 +1360,32 @@ deleteOneObject(const ObjectAddress *object, Relation *depRel, int flags)
 	if (flags & PERFORM_DELETION_CONCURRENTLY)
 		*depRel = table_open(DependRelationId, RowExclusiveLock);
 
+	deleteOneDepends(object, depRel, flags);
+
+	/*
+	 * CommandCounterIncrement here to ensure that preceding changes are all
+	 * visible to the next deletion step.
+	 */
+	CommandCounterIncrement();
+
+	/*
+	 * And we're done!
+	 */
+}
+
+/*
+ * deleteOneDepends: delete dependencies and other 2ndary data for a single object
+ *
+ * *depRel is the already-open pg_depend relation.
+ */
+void
+deleteOneDepends(const ObjectAddress *object, Relation *depRel, int flags)
+{
+	ScanKeyData key[3];
+	int			nkeys;
+	SysScanDesc scan;
+	HeapTuple	tup;
+
 	/*
 	 * Now remove any pg_depend records that link from this object to others.
 	 * (Any records linking to this object should be gone already.)
@@ -1373,16 +1438,6 @@ deleteOneObject(const ObjectAddress *object, Relation *depRel, int flags)
 	DeleteComments(object->objectId, object->classId, object->objectSubId);
 	DeleteSecurityLabel(object);
 	DeleteInitPrivs(object);
-
-	/*
-	 * CommandCounterIncrement here to ensure that preceding changes are all
-	 * visible to the next deletion step.
-	 */
-	CommandCounterIncrement();
-
-	/*
-	 * And we're done!
-	 */
 }
 
 /*
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ff684d9f12..92ca0927fa 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2011,17 +2011,27 @@ index_constraint_create(Relation heapRelation,
  * CONCURRENTLY.
  */
 void
-index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
+index_drop(Oid origindexId, bool concurrent, bool concurrent_lock_mode)
 {
-	Oid			heapId;
-	Relation	userHeapRelation;
-	Relation	userIndexRelation;
+	bool		is_partitioned;
+	/* These are lists to handle the case of partitioned indexes.  In the
+	 * normal case, they're length 1 */
+	List		*indexIds;
+	List		*heapIds = NIL;
+	List		*userIndexRelations = NIL,
+				*userHeapRelations = NIL;
+	List		*heaplocktags = NIL;
+	List		*heaprelids = NIL,
+				*indexrelids = NIL;
+
+	ListCell	*lc, *lc2, *lc3;
+
+	MemoryContext   long_context = AllocSetContextCreate(TopMemoryContext,
+			"DROP INDEX", ALLOCSET_DEFAULT_SIZES);
+	MemoryContext   old_context;
+
 	Relation	indexRelation;
-	HeapTuple	tuple;
-	bool		hasexprs;
-	LockRelId	heaprelid,
-				indexrelid;
-	LOCKTAG		heaplocktag;
+	Relation	pg_depend;
 	LOCKMODE	lockmode;
 
 	/*
@@ -2030,7 +2040,7 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 	 * lock (see comments in RemoveRelations), and a non-concurrent DROP is
 	 * more efficient.
 	 */
-	Assert(get_rel_persistence(indexId) != RELPERSISTENCE_TEMP ||
+	Assert(get_rel_persistence(origindexId) != RELPERSISTENCE_TEMP ||
 		   (!concurrent && !concurrent_lock_mode));
 
 	/*
@@ -2051,16 +2061,38 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 	 * AccessExclusiveLock on the index below, once we're sure nobody else is
 	 * using it.)
 	 */
-	heapId = IndexGetRelation(indexId, false);
 	lockmode = (concurrent || concurrent_lock_mode) ? ShareUpdateExclusiveLock : AccessExclusiveLock;
-	userHeapRelation = table_open(heapId, lockmode);
-	userIndexRelation = index_open(indexId, lockmode);
 
-	/*
-	 * We might still have open queries using it in our own session, which the
-	 * above locking won't prevent, so test explicitly.
-	 */
-	CheckTableNotInUse(userIndexRelation, "DROP INDEX");
+	old_context = MemoryContextSwitchTo(long_context);
+	is_partitioned = (get_rel_relkind(origindexId) == RELKIND_PARTITIONED_INDEX);
+	if (is_partitioned)
+		/* This includes the index itself */
+		indexIds = find_all_inheritors(origindexId, lockmode, NULL);
+	else
+		indexIds = list_make1_oid(origindexId);
+
+	foreach(lc, indexIds)
+	{
+		Oid	indexId = lfirst_oid(lc);
+		Relation	userHeapRelation;
+		Relation	userIndexRelation;
+		Oid			heapId;
+
+		heapId = IndexGetRelation(indexId, false);
+		heapIds = lappend_oid(heapIds, heapId);
+
+		userHeapRelation = table_open(heapId, lockmode);
+		userHeapRelations = lappend(userHeapRelations, userHeapRelation);
+		userIndexRelation = index_open(indexId, lockmode);
+		userIndexRelations = lappend(userIndexRelations, userIndexRelation);
+
+		/*
+		 * We might still have open queries using it in our own session, which the
+		 * above locking won't prevent, so test explicitly.
+		 */
+		CheckTableNotInUse(userIndexRelation, "DROP INDEX");
+	}
+	MemoryContextSwitchTo(old_context);
 
 	/*
 	 * Drop Index Concurrently is more or less the reverse process of Create
@@ -2113,66 +2145,96 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("DROP INDEX CONCURRENTLY must be first action in transaction")));
 
-		/*
-		 * Mark index invalid by updating its pg_index entry
-		 */
-		index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
+		MemoryContextSwitchTo(long_context);
+		forthree(lc, indexIds, lc2, userHeapRelations, lc3, userIndexRelations)
+		{
+			Oid			indexId = lfirst_oid(lc);
+			Relation 	userHeapRelation = lfirst(lc2);
+			Relation 	userIndexRelation = lfirst(lc3);
 
-		/*
-		 * Invalidate the relcache for the table, so that after this commit
-		 * all sessions will refresh any cached plans that might reference the
-		 * index.
-		 */
-		CacheInvalidateRelcache(userHeapRelation);
+			LOCKTAG		*locktag;
+			LockRelId	*heaprelid, *indexrelid;
 
-		/* save lockrelid and locktag for below, then close but keep locks */
-		heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
-		SET_LOCKTAG_RELATION(heaplocktag, heaprelid.dbId, heaprelid.relId);
-		indexrelid = userIndexRelation->rd_lockInfo.lockRelId;
+			/*
+			 * Mark index invalid by updating its pg_index entry
+			 */
+			index_set_state_flags(indexId, INDEX_DROP_CLEAR_VALID);
 
-		table_close(userHeapRelation, NoLock);
-		index_close(userIndexRelation, NoLock);
+			/*
+			 * Invalidate the relcache for the table, so that after this commit
+			 * all sessions will refresh any cached plans that might reference the
+			 * index.
+			 */
+			CacheInvalidateRelcache(userHeapRelation);
 
-		/*
-		 * We must commit our current transaction so that the indisvalid
-		 * update becomes visible to other transactions; then start another.
-		 * Note that any previously-built data structures are lost in the
-		 * commit.  The only data we keep past here are the relation IDs.
-		 *
-		 * Before committing, get a session-level lock on the table, to ensure
-		 * that neither it nor the index can be dropped before we finish. This
-		 * cannot block, even if someone else is waiting for access, because
-		 * we already have the same lock within our transaction.
-		 */
-		LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
-		LockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+			/* save lockrelid and locktag for below, then close but keep locks */
+			heaprelid = palloc(sizeof(*heaprelid));
+			*heaprelid = userHeapRelation->rd_lockInfo.lockRelId;
+
+			locktag = palloc(sizeof(*locktag));
+			SET_LOCKTAG_RELATION(*locktag, heaprelid->dbId, heaprelid->relId);
+			heaplocktags = lappend(heaplocktags, locktag);
+
+			indexrelid = palloc(sizeof(*indexrelid));
+			*indexrelid = userIndexRelation->rd_lockInfo.lockRelId;
+
+			heaprelids = lappend(heaprelids, heaprelid);
+			indexrelids = lappend(indexrelids, indexrelid);
+
+			table_close(userHeapRelation, NoLock);
+			index_close(userIndexRelation, NoLock);
+
+			/*
+			 * We must commit our current transaction so that the indisvalid
+			 * update becomes visible to other transactions; then start another.
+			 *
+			 * Before committing, get a session-level lock on the table, to ensure
+			 * that neither it nor the index can be dropped before we finish. This
+			 * cannot block, even if someone else is waiting for access, because
+			 * we already have the same lock within our transaction.
+			 */
+			LockRelationIdForSession(heaprelid, ShareUpdateExclusiveLock);
+			LockRelationIdForSession(indexrelid, ShareUpdateExclusiveLock);
+		}
+		MemoryContextSwitchTo(old_context);
+
+		list_free(userHeapRelations);
+		list_free(userIndexRelations);
+		userHeapRelations = userIndexRelations = NIL;
 
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 		StartTransactionCommand();
 
-		/*
-		 * Now we must wait until no running transaction could be using the
-		 * index for a query.  Use AccessExclusiveLock here to check for
-		 * running transactions that hold locks of any kind on the table. Note
-		 * we do not need to worry about xacts that open the table for reading
-		 * after this point; they will see the index as invalid when they open
-		 * the relation.
-		 *
-		 * Note: the reason we use actual lock acquisition here, rather than
-		 * just checking the ProcArray and sleeping, is that deadlock is
-		 * possible if one of the transactions in question is blocked trying
-		 * to acquire an exclusive lock on our table.  The lock code will
-		 * detect deadlock and error out properly.
-		 *
-		 * Note: we report progress through WaitForLockers() unconditionally
-		 * here, even though it will only be used when we're called by REINDEX
-		 * CONCURRENTLY and not when called by DROP INDEX CONCURRENTLY.
-		 */
-		WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+		forthree(lc, heaplocktags, lc2, heapIds, lc3, indexIds)
+		{
+			LOCKTAG	*heaplocktag = lfirst(lc);
+			Oid		heapId = lfirst_oid(lc2);
+			Oid		indexId = lfirst_oid(lc3);
 
-		/* Finish invalidation of index and mark it as dead */
-		index_concurrently_set_dead(heapId, indexId);
+			/*
+			 * Now we must wait until no running transaction could be using the
+			 * index for a query.  Use AccessExclusiveLock here to check for
+			 * running transactions that hold locks of any kind on the table. Note
+			 * we do not need to worry about xacts that open the table for reading
+			 * after this point; they will see the index as invalid when they open
+			 * the relation.
+			 *
+			 * Note: the reason we use actual lock acquisition here, rather than
+			 * just checking the ProcArray and sleeping, is that deadlock is
+			 * possible if one of the transactions in question is blocked trying
+			 * to acquire an exclusive lock on our table.  The lock code will
+			 * detect deadlock and error out properly.
+			 *
+			 * Note: we report progress through WaitForLockers() unconditionally
+			 * here, even though it will only be used when we're called by REINDEX
+			 * CONCURRENTLY and not when called by DROP INDEX CONCURRENTLY.
+			 */
+			WaitForLockers(*heaplocktag, AccessExclusiveLock, true);
+
+			/* Finish invalidation of index and mark it as dead */
+			index_concurrently_set_dead(heapId, indexId);
+		}
 
 		/*
 		 * Again, commit the transaction to make the pg_index update visible
@@ -2180,105 +2242,147 @@ index_drop(Oid indexId, bool concurrent, bool concurrent_lock_mode)
 		 */
 		CommitTransactionCommand();
 		StartTransactionCommand();
+		PushActiveSnapshot(GetTransactionSnapshot());
 
-		/*
-		 * Wait till every transaction that saw the old index state has
-		 * finished.  See above about progress reporting.
-		 */
-		WaitForLockers(heaplocktag, AccessExclusiveLock, true);
+		MemoryContextSwitchTo(long_context);
+		forthree(lc, heaplocktags, lc2, heapIds, lc3, indexIds)
+		{
+			LOCKTAG *heaplocktag = lfirst(lc);
+			Oid		heapId		= lfirst_oid(lc2);
+			Oid		indexId		= lfirst_oid(lc3);
 
-		/*
-		 * Re-open relations to allow us to complete our actions.
-		 *
-		 * At this point, nothing should be accessing the index, but lets
-		 * leave nothing to chance and grab AccessExclusiveLock on the index
-		 * before the physical deletion.
-		 */
-		userHeapRelation = table_open(heapId, ShareUpdateExclusiveLock);
-		userIndexRelation = index_open(indexId, AccessExclusiveLock);
+			Relation userHeapRelation, userIndexRelation;
+
+			/*
+			 * Wait till every transaction that saw the old index state has
+			 * finished.  See above about progress reporting.
+			 */
+			WaitForLockers(*heaplocktag, AccessExclusiveLock, true);
+
+			/*
+			 * Re-open relations to allow us to complete our actions.
+			 *
+			 * At this point, nothing should be accessing the index, but lets
+			 * leave nothing to chance and grab AccessExclusiveLock on the index
+			 * before the physical deletion.
+			 */
+			userHeapRelation = table_open(heapId, ShareUpdateExclusiveLock);
+			userHeapRelations = lappend(userHeapRelations, userHeapRelation);
+			userIndexRelation = index_open(indexId, AccessExclusiveLock);
+			userIndexRelations = lappend(userIndexRelations, userIndexRelation);
+		}
+		MemoryContextSwitchTo(old_context);
 	}
 	else
 	{
 		/* Not concurrent, so just transfer predicate locks and we're good */
-		TransferPredicateLocksToHeapRelation(userIndexRelation);
+		foreach(lc, userIndexRelations)
+		{
+			Relation userIndexRelation = lfirst(lc);
+			TransferPredicateLocksToHeapRelation(userIndexRelation);
+		}
 	}
 
-	/*
-	 * Schedule physical removal of the files (if any)
-	 */
-	if (userIndexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
-		RelationDropStorage(userIndexRelation);
+	/* Schedule physical removal of the files (if any) */
+	forboth(lc, userIndexRelations, lc2, indexIds)
+	{
+		Relation userIndexRelation = lfirst(lc);
+		Oid		indexId = lfirst_oid(lc2);
 
-	/*
-	 * Close and flush the index's relcache entry, to ensure relcache doesn't
-	 * try to rebuild it while we're deleting catalog entries. We keep the
-	 * lock though.
-	 */
-	index_close(userIndexRelation, NoLock);
+		if (userIndexRelation->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+			RelationDropStorage(userIndexRelation);
 
-	RelationForgetRelation(indexId);
+		/*
+		 * Close and flush the index's relcache entry, to ensure relcache doesn't
+		 * try to rebuild it while we're deleting catalog entries. We keep the
+		 * lock though.
+		 */
+		index_close(userIndexRelation, NoLock);
 
-	/*
-	 * fix INDEX relation, and check for expressional index
-	 */
-	indexRelation = table_open(IndexRelationId, RowExclusiveLock);
+		RelationForgetRelation(indexId);
+	}
 
-	tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexId));
-	if (!HeapTupleIsValid(tuple))
-		elog(ERROR, "cache lookup failed for index %u", indexId);
+	/* fix INDEX relation, and check for expressional index */
+	indexRelation = table_open(IndexRelationId, RowExclusiveLock);
+	foreach(lc, indexIds)
+	{
+		Oid			indexId = lfirst_oid(lc);
+		HeapTuple	tuple;
+		bool		hasexprs;
 
-	hasexprs = !heap_attisnull(tuple, Anum_pg_index_indexprs,
-							   RelationGetDescr(indexRelation));
+		tuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexId));
+		if (!HeapTupleIsValid(tuple))
+			elog(ERROR, "cache lookup failed for index %u", indexId);
 
-	CatalogTupleDelete(indexRelation, &tuple->t_self);
+		hasexprs = !heap_attisnull(tuple, Anum_pg_index_indexprs,
+								   RelationGetDescr(indexRelation));
 
-	ReleaseSysCache(tuple);
-	table_close(indexRelation, RowExclusiveLock);
+		CatalogTupleDelete(indexRelation, &tuple->t_self);
 
-	/*
-	 * if it has any expression columns, we might have stored statistics about
-	 * them.
-	 */
-	if (hasexprs)
-		RemoveStatistics(indexId, 0);
+		ReleaseSysCache(tuple);
 
-	/*
-	 * fix ATTRIBUTE relation
-	 */
-	DeleteAttributeTuples(indexId);
+		/*
+		 * if it has any expression columns, we might have stored statistics about
+		 * them.
+		 */
+		if (hasexprs)
+			RemoveStatistics(indexId, 0);
+	}
+	table_close(indexRelation, RowExclusiveLock);
 
-	/*
-	 * fix RELATION relation
-	 */
-	DeleteRelationTuple(indexId);
+	pg_depend = table_open(DependRelationId, RowExclusiveLock);
+	foreach(lc, indexIds)
+	{
+		Oid indexId = lfirst_oid(lc);
+		/* fix ATTRIBUTE relation */
+		DeleteAttributeTuples(indexId);
+		/* fix RELATION relation */
+		DeleteRelationTuple(indexId);
+		/* fix INHERITS relation */
+		DeleteInheritsTuple(indexId, InvalidOid);
+
+		/* The original parent index will have its dependencies dropped by deleteMultiple */
+		if (is_partitioned && concurrent && indexId != origindexId)
+		{
+			ObjectAddress	obj;
+			ObjectAddressSet(obj, RelationRelationId, indexId);
+			deleteOneDepends(&obj, &pg_depend, 0);
+		}
+	}
+	table_close(pg_depend, RowExclusiveLock);
 
-	/*
-	 * fix INHERITS relation
-	 */
-	DeleteInheritsTuple(indexId, InvalidOid);
+	foreach(lc, userHeapRelations)
+	{
+		Relation userHeapRelation = lfirst(lc);
 
-	/*
-	 * We are presently too lazy to attempt to compute the new correct value
-	 * of relhasindex (the next VACUUM will fix it if necessary). So there is
-	 * no need to update the pg_class tuple for the owning relation. But we
-	 * must send out a shared-cache-inval notice on the owning relation to
-	 * ensure other backends update their relcache lists of indexes.  (In the
-	 * concurrent case, this is redundant but harmless.)
-	 */
-	CacheInvalidateRelcache(userHeapRelation);
+		/*
+		 * We are presently too lazy to attempt to compute the new correct value
+		 * of relhasindex (the next VACUUM will fix it if necessary). So there is
+		 * no need to update the pg_class tuple for the owning relation. But we
+		 * must send out a shared-cache-inval notice on the owning relation to
+		 * ensure other backends update their relcache lists of indexes.  (In the
+		 * concurrent case, this is redundant but harmless.)
+		 */
+		CacheInvalidateRelcache(userHeapRelation);
 
-	/*
-	 * Close owning rel, but keep lock
-	 */
-	table_close(userHeapRelation, NoLock);
+		/*
+		 * Close owning rel, but keep lock
+		 */
+		table_close(userHeapRelation, NoLock);
+	}
 
 	/*
 	 * Release the session locks before we go.
 	 */
 	if (concurrent)
 	{
-		UnlockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock);
-		UnlockRelationIdForSession(&indexrelid, ShareUpdateExclusiveLock);
+		forboth(lc, heaprelids, lc2, indexrelids)
+		{
+			LockRelId	*heaprelid = lfirst(lc),
+						*indexrelid = lfirst(lc2);
+			UnlockRelationIdForSession(heaprelid, ShareUpdateExclusiveLock);
+			UnlockRelationIdForSession(indexrelid, ShareUpdateExclusiveLock);
+		}
 	}
 }
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index a29c14bf1c..b6af39cc77 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1374,16 +1374,17 @@ RemoveRelations(DropStmt *drop)
 			flags |= PERFORM_DELETION_CONCURRENTLY;
 		}
 
-		/*
-		 * Concurrent index drop cannot be used with partitioned indexes,
-		 * either.
-		 */
+		/* Concurrent drop of partitioned index is specially handled */
 		if ((flags & PERFORM_DELETION_CONCURRENTLY) != 0 &&
 			get_rel_relkind(relOid) == RELKIND_PARTITIONED_INDEX)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot drop partitioned index \"%s\" concurrently",
-							rel->relname)));
+		{
+			ObjectAddressSet(obj, RelationRelationId, relOid);
+			performDeleteParentIndexConcurrent(&obj, drop->behavior, flags);
+
+			/* There can be only one relation */
+			Assert(list_length(drop->objects) == 1);
+			break;
+		}
 
 		/* OK, we're ready to delete this one */
 		obj.classId = RelationRelationId;
diff --git a/src/include/catalog/dependency.h b/src/include/catalog/dependency.h
index a8f7e9965b..15fb6d90bf 100644
--- a/src/include/catalog/dependency.h
+++ b/src/include/catalog/dependency.h
@@ -149,6 +149,12 @@ extern void ReleaseDeletionLock(const ObjectAddress *object);
 extern void performDeletion(const ObjectAddress *object,
 							DropBehavior behavior, int flags);
 
+extern void deleteOneDepends(const ObjectAddress *object,
+							Relation *depRel, int flags);
+
+extern void performDeleteParentIndexConcurrent(const ObjectAddress *object,
+							DropBehavior behavior, int flags);
+
 extern void performMultipleDeletions(const ObjectAddresses *objects,
 									 DropBehavior behavior, int flags);
 
diff --git a/src/test/regress/expected/indexing.out b/src/test/regress/expected/indexing.out
index f04abc6897..2fb00a2fa7 100644
--- a/src/test/regress/expected/indexing.out
+++ b/src/test/regress/expected/indexing.out
@@ -226,9 +226,7 @@ create table idxpart1 partition of idxpart for values from (0) to (10);
 drop index idxpart1_a_idx;	-- no way
 ERROR:  cannot drop index idxpart1_a_idx because index idxpart_a_idx requires it
 HINT:  You can drop index idxpart_a_idx instead.
-drop index concurrently idxpart_a_idx;	-- unsupported
-ERROR:  cannot drop partitioned index "idxpart_a_idx" concurrently
-drop index idxpart_a_idx;	-- both indexes go away
+drop index concurrently idxpart_a_idx;
 select relname, relkind from pg_class
   where relname like 'idxpart%' order by relname;
  relname  | relkind 
diff --git a/src/test/regress/sql/indexing.sql b/src/test/regress/sql/indexing.sql
index 3d4b6e9bc9..610c12de9e 100644
--- a/src/test/regress/sql/indexing.sql
+++ b/src/test/regress/sql/indexing.sql
@@ -104,8 +104,7 @@ create table idxpart (a int) partition by range (a);
 create index on idxpart (a);
 create table idxpart1 partition of idxpart for values from (0) to (10);
 drop index idxpart1_a_idx;	-- no way
-drop index concurrently idxpart_a_idx;	-- unsupported
-drop index idxpart_a_idx;	-- both indexes go away
+drop index concurrently idxpart_a_idx;
 select relname, relkind from pg_class
   where relname like 'idxpart%' order by relname;
 create index on idxpart (a);
-- 
2.17.0

David Steele

david@pgmasters.net

almost 5 years ago

In reply to: Justin Pryzby (#1)

Re: DROP INDEX CONCURRENTLY on partitioned index

Hi Michael,

On 10/27/20 8:44 PM, Justin Pryzby wrote:

3. I have a patch which changes index_drop() to "expand" a partitioned index into
its list of children. Each of these becomes a List:
| indexId, heapId, userIndexRelation, userHeapRelation, heaplocktag, heaprelid, indexrelid
The same process is followed as for a single index, but handling all partitions
at once in two transactions total. Arguably, this is bad since that function
currently takes a single Oid but would now ends up operating on a list of indexes.

This is what's implemented in the attached. It's very possible I've missed
opportunities for better simplification/integration.

It appears there are still some issues to be resolved with this patch,
but the next step seems to be for you to have a look at Justin's most
recent patch.

Regards,
--
-David
david@pgmasters.net

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: David Steele (#2)

Re: DROP INDEX CONCURRENTLY on partitioned index

On Fri, Mar 05, 2021 at 09:27:05AM -0500, David Steele wrote:

It appears there are still some issues to be resolved with this patch, but
the next step seems to be for you to have a look at Justin's most recent
patch.

Not sure if I'll be able to do that by the end of this month. Looking
quicky at the patch, I am not much a fan of the new code path
introduced for the deletion of dependent objects in the partition
tree, so my gut is telling me that we need to think harder about the
problem at hand first.
--
Michael

vignesh C

vignesh21@gmail.com

over 4 years ago

In reply to: Justin Pryzby (#1)

Re: DROP INDEX CONCURRENTLY on partitioned index

On Wed, Oct 28, 2020 at 6:14 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Forking this thread, since the existing CFs have been closed.
/messages/by-id/20200914143102.GX18552@telsasoft.com

On Mon, Sep 14, 2020 at 09:31:03AM -0500, Justin Pryzby wrote:

On Sat, Sep 12, 2020 at 10:35:34AM +0900, Michael Paquier wrote:

On Fri, Sep 11, 2020 at 07:13:01PM -0500, Justin Pryzby wrote:

On Tue, Sep 08, 2020 at 01:31:05PM +0900, Michael Paquier wrote:

- CIC on partitioned relations. (Should we also care about DROP INDEX
CONCURRENTLY as well?)

Do you have any idea what you think that should look like for DROP INDEX
CONCURRENTLY ?

Making the maintenance of the partition tree consistent to the user is
the critical part here, so my guess on this matter is:
1) Remove each index from the partition tree and mark the indexes as
invalid in the same transaction. This makes sure that after commit no
indexes would get used for scans, and the partition dependency tree
pis completely removed with the parent table. That's close to what we
do in index_concurrently_swap() except that we just want to remove the
dependencies with the partitions, and not just swap them of course.
2) Switch each index to INDEX_DROP_SET_DEAD, one per transaction
should be fine as that prevents inserts.
3) Finish the index drop.

Step 2) and 3) could be completely done for each index as part of
index_drop(). The tricky part is to integrate 1) cleanly within the
existing dependency machinery while still knowing about the list of
partitions that can be removed. I think that this part is not that
straight-forward, but perhaps we could just make this logic part of
RemoveRelations() when listing all the objects to remove.

Thanks.

I see three implementation ideas..

1. I think your way has an issue that the dependencies are lost. If there's an
interruption, the user is maybe left with hundreds or thousands of detached
indexes to clean up. This is strange since there's actually no detach command
for indexes (but they're implicitly "attached" when a matching parent index is
created). A 2nd issue is that DETACH currently requires an exclusive lock (but
see Alvaro's WIP patch).

I think this is a critical problem with this approach. It's not ok if a
failure leaves behind N partition indexes not attached to any parent.
They may have pretty different names, which is a mess to clean up.

2. Maybe the easiest way is to mark all indexes invalid and then drop all
partitions (concurrently) and then the partitioned parent. If interrupted,
this would leave a parent index marked "invalid", and some child tables with no
indexes. I think this may be "ok". The same thing is possible if a concurrent
build is interrupted, right ?

I think adding the "invalid" mark in the simple/naive way isn't enough - it has
to do everything DROP INDEX CONCURRENTLY does (of course).

3. I have a patch which changes index_drop() to "expand" a partitioned index into
its list of children. Each of these becomes a List:
| indexId, heapId, userIndexRelation, userHeapRelation, heaplocktag, heaprelid, indexrelid
The same process is followed as for a single index, but handling all partitions
at once in two transactions total. Arguably, this is bad since that function
currently takes a single Oid but would now ends up operating on a list of indexes.

This is what's implemented in the attached. It's very possible I've missed
opportunities for better simplification/integration.

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

Regards,
Vignesh

Justin Pryzby

pryzby@telsasoft.com

over 4 years ago

In reply to: vignesh C (#4)

Re: DROP INDEX CONCURRENTLY on partitioned index

On Wed, Jul 14, 2021 at 09:18:12PM +0530, vignesh C wrote:

The patch does not apply on Head anymore, could you rebase and post a
patch. I'm changing the status to "Waiting for Author".

I'm withdrawing this patch at least until the corresponding patch for CIC is
progressed.

https://commitfest.postgresql.org/33/2815/

--
Justin