CLUSTER on partitioned index

Started by Justin Pryzbyabout 5 years ago37 messages

pryzby@telsasoft.com

about 5 years ago

3 attachment(s)

Forking this thread, since the existing CFs have been closed.
/messages/by-id/20200914143102.GX18552@telsasoft.com

On Tue, Oct 06, 2020 at 01:38:23PM +0900, Michael Paquier wrote:

On Mon, Oct 05, 2020 at 10:07:33PM -0500, Justin Pryzby wrote:

Honestly, I think you're over-thinking and over-engineering indisclustered.

If "clusteredness" was something we offered to maintain across DML, I think
that might be important to provide stronger guarantees. As it is now, I don't
think this patch is worth changing the catalog definition.

Well, this use case is new because we are discussing the relationship
of indisclustered across multiple transactions for multiple indexes,
so I'd rather have this discussion than not, and I have learnt
the hard way with REINDEX that we should care a lot about the
consistency of partition trees at any step of the operation.

indisclustered is only used as a default for "CLUSTER" (without USING). The
worst thing that can happen if it's "inconsistent" is that "CLUSTER;" clusters
a table on the "old" clustered index (that it was already clustered on), which
is what would've happened before running some command which was interrupted.

Let's
imagine a simple example here, take this partition tree: p (parent),
and two partitions p1 and p2. p has two partitioned indexes i and j,
indexes also present in p1 and p2 as i1, i2, j1 and j2. Let's assume
that the user has done a CLUSTER on p USING i that completes, meaning
that i, i1 and i2 have indisclustered set. Now let's assume that the
user does a CLUSTER on p USING j this time, and that this command
fails while processing p2, meaning that indisclustered is set for j1,
i2, and perhaps i or j depending on what the patch does.

I think the state of "indisclustered" at that point is not critical.
The command failed, and the user can re-run it, or ALTER..SET CLUSTER.
Actually, I think the only inconsistent state is if two indexes are both marked
indisclustered.

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

Also, I noticed that CREATE TABLE (LIKE.. INCLUDING INDEXES) doesn't preserve
indisclustered, but I can't say that's an issue.

--
Justin

Attachments:

v1-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From dd4588352f99186f28fc666c497f85a87ac11da2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v1 1/3] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.

TODO: handle DB-WIDE "CLUSTER;" for partitioned tables
new partitions need to inherit indisclustered ?
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 169 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/nodes/parsenodes.h        |   5 +-
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 209 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index b9450e7366..0476cfff72 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -172,6 +172,12 @@ CLUSTER [VERBOSE]
     are periodically reclustered.
    </para>
 
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    index partition of the given partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 0d647e912c..1db8382a27 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/progress.h"
@@ -75,6 +77,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							TransactionId *pFreezeXid,
 							MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, bool isTopLevel, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -116,7 +121,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -127,14 +132,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -172,8 +169,32 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, stmt->options, isTopLevel);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, stmt->options, isTopLevel);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, isTopLevel, stmt->options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -183,7 +204,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -207,26 +227,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
 
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						stmt->options | CLUOPT_RECHECK,
-						isTopLevel);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, isTopLevel, stmt->options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -332,9 +333,10 @@ cluster_rel(Oid tableOid, Oid indexOid, int options, bool isTopLevel)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -378,8 +380,13 @@ cluster_rel(Oid tableOid, Oid indexOid, int options, bool isTopLevel)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -395,6 +402,14 @@ cluster_rel(Oid tableOid, Oid indexOid, int options, bool isTopLevel)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -463,6 +478,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -487,12 +505,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -565,10 +577,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool isTopLevel, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1563,3 +1571,76 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels are also processed by cluster_rel, to
+		 * call check_index_is_clusterable() and mark_index_clustered().
+		 */
+
+		/*
+		 * We have to build the list in a different memory context so it will
+		 * survive the cross-transaction processing
+		 */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+
+		MemoryContextSwitchTo(old_context);
+	}
+
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, bool isTopLevel, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					options | CLUOPT_RECHECK,
+					isTopLevel);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index b2b4f1fd4d..c3f080e691 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -587,6 +587,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 60c2f45466..d428a94454 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3202,8 +3202,9 @@ typedef struct AlterSystemStmt
  */
 typedef enum ClusterOption
 {
-	CLUOPT_RECHECK = 1 << 0,	/* recheck relation state */
-	CLUOPT_VERBOSE = 1 << 1		/* print progress info */
+	CLUOPT_VERBOSE = 1 << 0,	/* print progress info */
+	CLUOPT_RECHECK = 1 << 1,	/* recheck relation state */
+	CLUOPT_RECHECK_ISCLUSTERED = 1 << 2,	/* recheck relation state for indisclustered */
 } ClusterOption;
 
 typedef struct ClusterStmt
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..e4448350e7 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+   relname   | relkind | ?column? 
+-------------+---------+----------
+ clstrpart   | p       | t
+ clstrpart1  | p       | t
+ clstrpart11 | r       | f
+ clstrpart12 | p       | t
+ clstrpart2  | r       | f
+ clstrpart3  | p       | t
+ clstrpart33 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+    indexrelid     | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..22225dc924 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v1-0002-preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From 66070ce3398d9e040f139a147a3f2f995ef51975 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v1 2/3] preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  2 +-
 src/test/regress/expected/cluster.out | 11 +++++++++++
 src/test/regress/sql/cluster.sql      |  3 +++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 3ce6eb61d5..92ca0927fa 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -601,7 +601,7 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) && get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e4448350e7..8f245da46d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,17 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
+DROP TABLE clstrpart;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 22225dc924..b871ab53c3 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,9 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+DROP TABLE clstrpart;
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v1-0003-Invalidate-indisclustered-when-attaching-uncluste.patchtext/x-diff; charset=us-asciiDownload

From 2ae2377fab3f8eebc4a45f7e762566a02e0df71d Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v1 3/3] Invalidate indisclustered when attaching unclustered
 indexes

---
 src/backend/commands/cluster.c        | 76 +++++++++++++++------------
 src/backend/commands/indexcmds.c      | 21 ++++++++
 src/test/regress/expected/cluster.out | 27 ++++++++++
 src/test/regress/sql/cluster.sql      |  9 ++++
 4 files changed, 99 insertions(+), 34 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 1db8382a27..58a0605482 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -76,6 +76,7 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool *pSwapToastByContent,
 							TransactionId *pFreezeXid,
 							MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -492,6 +493,32 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
@@ -500,20 +527,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
 	Relation	pg_index;
 	ListCell   *index;
 
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
-
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
 	 */
@@ -523,34 +539,26 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	{
 		Oid			thisIndexOid = lfirst_oid(index);
 
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
-
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
-		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
+		if (thisIndexOid != indexOid)
 		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+			Oid parentind = thisIndexOid;
+			set_indisclustered(thisIndexOid, false, pg_index);
+
+			/*
+			 * When setting a given index as clustered, also remove
+			 * indisclustered from all parents of other partitioned indexes
+			 */
+			while (get_rel_relispartition(parentind))
+			{
+				parentind = get_partition_parent(parentind);
+				set_indisclustered(parentind, false, pg_index);
+			}
 		}
+		else
+			set_indisclustered(thisIndexOid, true, pg_index);
 
 		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
 									 InvalidOid, is_internal);
-
-		heap_freetuple(indexTuple);
 	}
 
 	table_close(pg_index, RowExclusiveLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index d76d7a22dd..99b0bf745e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -26,12 +26,14 @@
 #include "catalog/index.h"
 #include "catalog/indexing.h"
 #include "catalog/pg_am.h"
+#include "catalog/partition.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
@@ -3894,6 +3896,25 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/* if the attached index is not clustered, invalidate all parents cluster mark, if any */
+	/* TODO: if the attached index *is* clustered, then invalidate the cluster mark on any *other* index.. */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 8f245da46d..6e6fa77c4b 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -505,6 +505,33 @@ Partition of: clstrpart FOR VALUES FROM (30) TO (40)
 Indexes:
     "clstrpart4_a_idx" btree (a) CLUSTER
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 5 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 5 (Use \d+ to list them.)
+
 DROP TABLE clstrpart;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b871ab53c3..640922ede0 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -222,6 +222,15 @@ CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
 CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
 \d clstrpart4
+-- Check that attaching an unclustered index marks the parent unclustered:
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
 DROP TABLE clstrpart;
 
 -- Test CLUSTER with external tuplesorting
-- 
2.17.0

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Justin Pryzby (#1)

3 attachment(s)

Re: CLUSTER on partitioned index

@cfbot: rebased

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

--
Justin

Attachments:

v3-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From 521b91b2a5555f5688e88145714db1e990b803ea Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v3 1/3] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 167 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/nodes/parsenodes.h        |   5 +-
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 208 insertions(+), 53 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index b9450e7366..0476cfff72 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -172,6 +172,12 @@ CLUSTER [VERBOSE]
     are periodically reclustered.
    </para>
 
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    index partition of the given partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 04d12a7ece..391e018bbd 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/progress.h"
@@ -72,6 +74,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -113,7 +118,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -124,14 +129,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -169,8 +166,32 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, stmt->options);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, stmt->options);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, stmt->options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -180,7 +201,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -204,25 +224,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
 
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						stmt->options | CLUOPT_RECHECK);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, stmt->options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -328,9 +330,10 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -374,8 +377,13 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -391,6 +399,14 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -459,6 +475,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -483,12 +502,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -560,10 +573,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1557,3 +1566,75 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels are also processed by cluster_rel, to
+		 * call check_index_is_clusterable() and mark_index_clustered().
+		 */
+
+		/*
+		 * We have to build the list in a different memory context so it will
+		 * survive the cross-transaction processing
+		 */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+
+		MemoryContextSwitchTo(old_context);
+	}
+
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					options | CLUOPT_RECHECK);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5238a960f7..07ef7fc1b7 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 17b4e335c6..c862662a78 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3198,8 +3198,9 @@ typedef struct AlterSystemStmt
  */
 typedef enum ClusterOption
 {
-	CLUOPT_RECHECK = 1 << 0,	/* recheck relation state */
-	CLUOPT_VERBOSE = 1 << 1		/* print progress info */
+	CLUOPT_VERBOSE = 1 << 0,	/* print progress info */
+	CLUOPT_RECHECK = 1 << 1,	/* recheck relation state */
+	CLUOPT_RECHECK_ISCLUSTERED = 1 << 2,	/* recheck relation state for indisclustered */
 } ClusterOption;
 
 typedef struct ClusterStmt
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..e4448350e7 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+   relname   | relkind | ?column? 
+-------------+---------+----------
+ clstrpart   | p       | t
+ clstrpart1  | p       | t
+ clstrpart11 | r       | f
+ clstrpart12 | p       | t
+ clstrpart2  | r       | f
+ clstrpart3  | p       | t
+ clstrpart33 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+    indexrelid     | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..22225dc924 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v3-0002-preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From cbd643ad3cb19f500a2af9200234402acf2f0511 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v3 2/3] preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  2 +-
 src/test/regress/expected/cluster.out | 11 +++++++++++
 src/test/regress/sql/cluster.sql      |  3 +++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 5ec225abe1..55d429971f 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -603,7 +603,7 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) && get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e4448350e7..8f245da46d 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,17 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
+DROP TABLE clstrpart;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 22225dc924..b871ab53c3 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,9 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+DROP TABLE clstrpart;
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v3-0003-Propogate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload

From f41adc43b9b27c7b8aec8c78297ad8042f5049e7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v3 3/3] Propogate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 116 +++++++++++++++++---------
 src/backend/commands/indexcmds.c      |  20 +++++
 src/test/regress/expected/cluster.out |  43 ++++++++++
 src/test/regress/sql/cluster.sql      |  13 +++
 4 files changed, 154 insertions(+), 38 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 391e018bbd..1e36c47ec6 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -73,6 +73,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -489,65 +490,104 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered as clustered, and parents of other
+ * indexes as unclustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
+	ListCell	*lc, *lc2;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel),
+			ShareRowExclusiveLock, NULL);
 
 	/*
-	 * If the index is already marked clustered, no need to do anything.
+	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's chilren
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	if (OidIsValid(indexOid))
+
+	foreach(lc, inh)
 	{
-		if (get_index_isclustered(indexOid))
-			return;
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
+		List 		*indexes = RelationGetIndexList(thisrel);
+
+		foreach (lc2, indexes)
+		{
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			Oid		partoid = index_get_partition(thisrel, indexOid);
+
+			if (thisIndexOid == indexOid || partoid == thisIndexOid) // OidIsValid(partoid))
+			{
+				/* A child of the clustered index should be set clustered, too */
+				set_indisclustered(thisIndexOid, true, pg_index);
+			}
+			else
+			{
+				/* indexes which are not children of the clustered index are set unclustered */
+				set_indisclustered(thisIndexOid, false, pg_index);
+			}
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
+
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
 
+	list_free(inh);
+
 	/*
-	 * Check each index of the relation and set/clear the bit as needed.
+	 * When setting an index partition unclustered, also remove
+	 * indisclustered from its parents
+	 * use this instead? get_partition_ancestors()
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
 
-	foreach(index, RelationGetIndexList(rel))
+	foreach (lc, RelationGetIndexList(rel))
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
+		Oid	thisIndexOid = lfirst_oid(lc);
 
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		if (thisIndexOid == indexOid)
+			continue;
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		while (get_rel_relispartition(thisIndexOid))
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+			thisIndexOid = get_partition_parent(thisIndexOid);
+			set_indisclustered(thisIndexOid, false, pg_index);
 		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
-
-		heap_freetuple(indexTuple);
 	}
 
 	table_close(pg_index, RowExclusiveLock);
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 67bd9b12dc..206eb4d656 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -26,12 +26,14 @@
 #include "catalog/index.h"
 #include "catalog/indexing.h"
 #include "catalog/pg_am.h"
+#include "catalog/partition.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
@@ -3773,6 +3775,24 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/* if the attached index is not clustered, invalidate all parents cluster mark, if any */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 8f245da46d..91d0c03055 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -505,6 +505,49 @@ Partition of: clstrpart FOR VALUES FROM (30) TO (40)
 Indexes:
     "clstrpart4_a_idx" btree (a) CLUSTER
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 5 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 5 (Use \d+ to list them.)
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx_2;
+ERROR:  index "clstrpart_idx_2" for table "clstrpart" does not exist
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a) CLUSTER
+    "clstrpart1_idx_2" btree (a)
+Number of partitions: 2 (Use \d+ to list them.)
+
 DROP TABLE clstrpart;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index b871ab53c3..635df97257 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -222,6 +222,19 @@ CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
 CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
 \d clstrpart4
+-- Check that attaching an unclustered index marks the parent unclustered:
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that only one child is marked clustered after marking clustered on a different parent
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx_2;
+\d clstrpart1
 DROP TABLE clstrpart;
 
 -- Test CLUSTER with external tuplesorting
-- 
2.17.0

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Justin Pryzby (#2)

5 attachment(s)

Re: CLUSTER on partitioned index

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

--
Justin

Attachments:

v4-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From 75690aacef0d294e2667bd1091cf647dc9f5d187 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v4 1/5] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 167 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/nodes/parsenodes.h        |   5 +-
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 208 insertions(+), 53 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index b9450e7366..0476cfff72 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -172,6 +172,12 @@ CLUSTER [VERBOSE]
     are periodically reclustered.
    </para>
 
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    index partition of the given partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 04d12a7ece..391e018bbd 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/progress.h"
@@ -72,6 +74,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -113,7 +118,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -124,14 +129,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -169,8 +166,32 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, stmt->options);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, stmt->options);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, stmt->options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -180,7 +201,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -204,25 +224,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
 
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						stmt->options | CLUOPT_RECHECK);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, stmt->options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -328,9 +330,10 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -374,8 +377,13 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -391,6 +399,14 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -459,6 +475,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -483,12 +502,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -560,10 +573,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1557,3 +1566,75 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels are also processed by cluster_rel, to
+		 * call check_index_is_clusterable() and mark_index_clustered().
+		 */
+
+		/*
+		 * We have to build the list in a different memory context so it will
+		 * survive the cross-transaction processing
+		 */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+
+		MemoryContextSwitchTo(old_context);
+	}
+
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					options | CLUOPT_RECHECK);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5238a960f7..07ef7fc1b7 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 53de31f3b1..087b1c7af6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3199,8 +3199,9 @@ typedef struct AlterSystemStmt
  */
 typedef enum ClusterOption
 {
-	CLUOPT_RECHECK = 1 << 0,	/* recheck relation state */
-	CLUOPT_VERBOSE = 1 << 1		/* print progress info */
+	CLUOPT_VERBOSE = 1 << 0,	/* print progress info */
+	CLUOPT_RECHECK = 1 << 1,	/* recheck relation state */
+	CLUOPT_RECHECK_ISCLUSTERED = 1 << 2,	/* recheck relation state for indisclustered */
 } ClusterOption;
 
 typedef struct ClusterStmt
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..e4448350e7 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+   relname   | relkind | ?column? 
+-------------+---------+----------
+ clstrpart   | p       | t
+ clstrpart1  | p       | t
+ clstrpart11 | r       | f
+ clstrpart12 | p       | t
+ clstrpart2  | r       | f
+ clstrpart3  | p       | t
+ clstrpart33 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+    indexrelid     | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..22225dc924 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v4-0002-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload

From a3ab89d138c5b2a65b5b0900910d8e01b8a270ca Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v4 2/5] Propagate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 109 ++++++++++++++++----------
 src/backend/commands/indexcmds.c      |   2 +
 src/test/regress/expected/cluster.out |  46 +++++++++++
 src/test/regress/sql/cluster.sql      |  11 +++
 4 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 391e018bbd..4f30174ba7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -73,6 +73,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -489,66 +490,88 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered index as clustered, and parents of
+ * other indexes as unclustered.
+ * We wish to maintain the following properties:
+ * 1) Only one index on a relation can be marked clustered at once
+ * 2) If a partitioned index is clustered, then all its children must be
+ *     clustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
-
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
+	ListCell	*lc, *lc2;
+	List		*indexes;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel), ShareRowExclusiveLock, NULL);
 
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's children
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
-
-	foreach(index, RelationGetIndexList(rel))
+	foreach(lc, inh)
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
-
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		indexes = RelationGetIndexList(thisrel);
+		foreach (lc2, indexes)
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
+			bool	isclustered;
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			List	*parentoids = get_rel_relispartition(thisIndexOid) ?
+				get_partition_ancestors(thisIndexOid) : NIL;
 
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
+			/*
+			 * A child of the clustered index must be set clustered;
+			 * indexes which are not children of the clustered index are
+			 * set unclustered
+			 */
+			isclustered = (thisIndexOid == indexOid) ||
+					list_member_oid(parentoids, indexOid);
+			Assert(OidIsValid(indexOid) || !isclustered);
+			set_indisclustered(thisIndexOid, isclustered, pg_index);
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
 
-		heap_freetuple(indexTuple);
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
+	list_free(inh);
 
 	table_close(pg_index, RowExclusiveLock);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 54d90c28e7..538055c9a0 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -25,6 +25,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e4448350e7..a9fb9f1021 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,52 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a)
+
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a)
+    "clstrpart1_idx_2" btree (a) CLUSTER
+Number of partitions: 2 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 22225dc924..d15bd51496 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,17 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v4-0003-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload

From 491f16c3e2750ccc4a1b57d9f906149f042e30a3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 18:58:03 -0600
Subject: [PATCH v4 3/5] Invalidate parent indexes

---
 src/backend/commands/cluster.c        | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 26 ++++++++++++++++++++++++++
 src/test/regress/sql/cluster.sql      |  8 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 4f30174ba7..35beff6f9f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -573,6 +573,27 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	}
 	list_free(inh);
 
+	/*
+	 * Set parent of all indexes as unclustered when a rel is unclustered; and,
+	 * when an index is clustered, set parents of all /other/ indexes as
+	 * unclustered.
+	 */
+	indexes = RelationGetIndexList(rel);
+	foreach (lc, indexes)
+	{
+		Oid	thisIndexOid = lfirst_oid(lc);
+
+		if (thisIndexOid == indexOid)
+			continue;
+
+		while (get_rel_relispartition(thisIndexOid))
+		{
+			thisIndexOid = get_partition_parent(thisIndexOid);
+			set_indisclustered(thisIndexOid, false, pg_index);
+		}
+	}
+	list_free(indexes);
+
 	table_close(pg_index, RowExclusiveLock);
 }
 
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index a9fb9f1021..6d88978387 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -541,6 +541,32 @@ Indexes:
     "clstrpart1_idx_2" btree (a) CLUSTER
 Number of partitions: 2 (Use \d+ to list them.)
 
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index d15bd51496..c9bb204a93 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -231,6 +231,14 @@ CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
 \d clstrpart1
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v4-0004-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload

From 068dde5053d185a7eb94a9b9f744e730dc546d98 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 19:11:41 -0600
Subject: [PATCH v4 4/5] Invalidate parent index cluster on attach

---
 src/backend/commands/indexcmds.c      | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 14 ++++++++++++++
 src/test/regress/sql/cluster.sql      |  5 +++++
 3 files changed, 40 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 538055c9a0..fd8b2d56d2 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -3775,6 +3775,27 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/*
+	 * If the attached index is not clustered, invalidate cluster mark on
+	 * any parents
+	 */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 6d88978387..f0c962db75 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -567,6 +567,20 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 4 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index c9bb204a93..ff7ffed6e4 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -239,6 +239,11 @@ CLUSTER clstrpart1 USING clstrpart1_idx_2;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
 \d clstrpart
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v4-0005-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From 0f14d0d9ea4c668fbdcbb57e8d4c9078352c4964 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v4 5/5] Preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  2 +-
 src/test/regress/expected/cluster.out | 12 ++++++++++++
 src/test/regress/sql/cluster.sql      |  4 ++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index d1e120e5ae..af1565ec4c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -603,7 +603,7 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) && get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index f0c962db75..fc642f87d2 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -581,6 +581,18 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 4 (Use \d+ to list them.)
 
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index ff7ffed6e4..5df338f60d 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -244,6 +244,10 @@ ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
 ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
 \d clstrpart
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

Justin Pryzby

pryzby@telsasoft.com

about 5 years ago

In reply to: Justin Pryzby (#3)

6 attachment(s)

Re: CLUSTER on partitioned index

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

--
Justin

Attachments:

v5-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From 34944ed4a8e0313c7cb2c8eb996196ad9860b77d Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v5 2/7] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 167 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/nodes/parsenodes.h        |   5 +-
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 208 insertions(+), 53 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index b9450e7366..0476cfff72 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -172,6 +172,12 @@ CLUSTER [VERBOSE]
     are periodically reclustered.
    </para>
 
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    index partition of the given partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 04d12a7ece..391e018bbd 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/progress.h"
@@ -72,6 +74,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -113,7 +118,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -124,14 +129,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -169,8 +166,32 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, stmt->options);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, stmt->options);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, stmt->options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -180,7 +201,6 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -204,25 +224,7 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
 
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						stmt->options | CLUOPT_RECHECK);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, stmt->options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -328,9 +330,10 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -374,8 +377,13 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -391,6 +399,14 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -459,6 +475,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -483,12 +502,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -560,10 +573,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1557,3 +1566,75 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels are also processed by cluster_rel, to
+		 * call check_index_is_clusterable() and mark_index_clustered().
+		 */
+
+		/*
+		 * We have to build the list in a different memory context so it will
+		 * survive the cross-transaction processing
+		 */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+
+		MemoryContextSwitchTo(old_context);
+	}
+
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					options | CLUOPT_RECHECK);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 8afc780acc..1deafd2707 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index d1f9ef29ca..43b6d16a13 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3198,8 +3198,9 @@ typedef struct AlterSystemStmt
  */
 typedef enum ClusterOption
 {
-	CLUOPT_RECHECK = 1 << 0,	/* recheck relation state */
-	CLUOPT_VERBOSE = 1 << 1		/* print progress info */
+	CLUOPT_VERBOSE = 1 << 0,	/* print progress info */
+	CLUOPT_RECHECK = 1 << 1,	/* recheck relation state */
+	CLUOPT_RECHECK_ISCLUSTERED = 1 << 2,	/* recheck relation state for indisclustered */
 } ClusterOption;
 
 typedef struct ClusterStmt
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..e4448350e7 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+   relname   | relkind | ?column? 
+-------------+---------+----------
+ clstrpart   | p       | t
+ clstrpart1  | p       | t
+ clstrpart11 | r       | f
+ clstrpart12 | p       | t
+ clstrpart2  | r       | f
+ clstrpart3  | p       | t
+ clstrpart33 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+    indexrelid     | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..22225dc924 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v5-0003-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload

From 26723246af5b8bd48f85e6669822199511df05a5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v5 3/7] Propagate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 109 ++++++++++++++++----------
 src/backend/commands/indexcmds.c      |   2 +
 src/test/regress/expected/cluster.out |  46 +++++++++++
 src/test/regress/sql/cluster.sql      |  11 +++
 4 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 391e018bbd..4f30174ba7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -73,6 +73,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -489,66 +490,88 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered index as clustered, and parents of
+ * other indexes as unclustered.
+ * We wish to maintain the following properties:
+ * 1) Only one index on a relation can be marked clustered at once
+ * 2) If a partitioned index is clustered, then all its children must be
+ *     clustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
-
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
+	ListCell	*lc, *lc2;
+	List		*indexes;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel), ShareRowExclusiveLock, NULL);
 
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's children
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
-
-	foreach(index, RelationGetIndexList(rel))
+	foreach(lc, inh)
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
-
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		indexes = RelationGetIndexList(thisrel);
+		foreach (lc2, indexes)
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
+			bool	isclustered;
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			List	*parentoids = get_rel_relispartition(thisIndexOid) ?
+				get_partition_ancestors(thisIndexOid) : NIL;
 
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
+			/*
+			 * A child of the clustered index must be set clustered;
+			 * indexes which are not children of the clustered index are
+			 * set unclustered
+			 */
+			isclustered = (thisIndexOid == indexOid) ||
+					list_member_oid(parentoids, indexOid);
+			Assert(OidIsValid(indexOid) || !isclustered);
+			set_indisclustered(thisIndexOid, isclustered, pg_index);
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
 
-		heap_freetuple(indexTuple);
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
+	list_free(inh);
 
 	table_close(pg_index, RowExclusiveLock);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ca24620fd0..cd9ca1beff 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -25,6 +25,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e4448350e7..a9fb9f1021 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,52 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a)
+
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a)
+    "clstrpart1_idx_2" btree (a) CLUSTER
+Number of partitions: 2 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 22225dc924..d15bd51496 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,17 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v5-0004-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload

From c16bda5aa5dd18226b5348f7e259e9b7192e9e4f Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 18:58:03 -0600
Subject: [PATCH v5 4/7] Invalidate parent indexes

---
 src/backend/commands/cluster.c        | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 26 ++++++++++++++++++++++++++
 src/test/regress/sql/cluster.sql      |  8 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 4f30174ba7..35beff6f9f 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -573,6 +573,27 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	}
 	list_free(inh);
 
+	/*
+	 * Set parent of all indexes as unclustered when a rel is unclustered; and,
+	 * when an index is clustered, set parents of all /other/ indexes as
+	 * unclustered.
+	 */
+	indexes = RelationGetIndexList(rel);
+	foreach (lc, indexes)
+	{
+		Oid	thisIndexOid = lfirst_oid(lc);
+
+		if (thisIndexOid == indexOid)
+			continue;
+
+		while (get_rel_relispartition(thisIndexOid))
+		{
+			thisIndexOid = get_partition_parent(thisIndexOid);
+			set_indisclustered(thisIndexOid, false, pg_index);
+		}
+	}
+	list_free(indexes);
+
 	table_close(pg_index, RowExclusiveLock);
 }
 
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index a9fb9f1021..6d88978387 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -541,6 +541,32 @@ Indexes:
     "clstrpart1_idx_2" btree (a) CLUSTER
 Number of partitions: 2 (Use \d+ to list them.)
 
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index d15bd51496..c9bb204a93 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -231,6 +231,14 @@ CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
 \d clstrpart1
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v5-0005-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload

From e8546faf5afbb8d23919d0631f2c9b7e714c2587 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 19:11:41 -0600
Subject: [PATCH v5 5/7] Invalidate parent index cluster on attach

---
 src/backend/commands/indexcmds.c      | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 14 ++++++++++++++
 src/test/regress/sql/cluster.sql      |  5 +++++
 3 files changed, 40 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index cd9ca1beff..fd18fe2584 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -3864,6 +3864,27 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/*
+	 * If the attached index is not clustered, invalidate cluster mark on
+	 * any parents
+	 */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 6d88978387..f0c962db75 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -567,6 +567,20 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 4 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index c9bb204a93..ff7ffed6e4 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -239,6 +239,11 @@ CLUSTER clstrpart1 USING clstrpart1_idx_2;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
 \d clstrpart
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v5-0006-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From 2be544396d452f3c05faf066d1fcc08ffadac025 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v5 6/7] Preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  2 +-
 src/test/regress/expected/cluster.out | 12 ++++++++++++
 src/test/regress/sql/cluster.sql      |  4 ++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 731610c701..122c63c415 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -603,7 +603,7 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) && get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index f0c962db75..fc642f87d2 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -581,6 +581,18 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 4 (Use \d+ to list them.)
 
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index ff7ffed6e4..5df338f60d 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -244,6 +244,10 @@ ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
 ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
 \d clstrpart
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v5-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload

From 2bcb70391a15b605c090f585668161079aa2b0b5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 26 Nov 2020 14:37:08 -0600
Subject: [PATCH v5 1/7] pg_dump: make CLUSTER ON a separate dump object..

..since it needs to be restored after any child indexes are restored *and
attached*.  The order needs to be:

1) restore child and parent index (order doesn't matter);
2) attach child index;
3) set cluster on child and parent index (order doesn't matter);
---
 src/bin/pg_dump/pg_dump.c      | 86 ++++++++++++++++++++++++++--------
 src/bin/pg_dump/pg_dump.h      |  8 ++++
 src/bin/pg_dump/pg_dump_sort.c | 28 ++++++-----
 3 files changed, 91 insertions(+), 31 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index dc1d41dd8d..ad990f5f8d 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -207,6 +207,7 @@ static void dumpSequence(Archive *fout, TableInfo *tbinfo);
 static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
 static void dumpIndex(Archive *fout, IndxInfo *indxinfo);
 static void dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo);
+static void dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo);
 static void dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo);
 static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo);
 static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo);
@@ -7036,6 +7037,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				i_inddependcollversions;
 	int			ntups;
 
+	int	ncluster = 0;
+	IndexClusterInfo *clusterinfo;
+	clusterinfo = (IndexClusterInfo *)
+		pg_malloc0(numTables * sizeof(IndexClusterInfo));
+
 	for (i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &tblinfo[i];
@@ -7415,6 +7421,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				/* Plain secondary index */
 				indxinfo[j].indexconstraint = 0;
 			}
+
+			/* Record each table's CLUSTERed index, if any */
+			if (indxinfo[j].indisclustered)
+			{
+				IndxInfo   *index = &indxinfo[j];
+				IndexClusterInfo *cluster = &clusterinfo[ncluster];
+
+				cluster->dobj.objType = DO_INDEX_CLUSTER_ON;
+				cluster->dobj.catId.tableoid = 0;
+				cluster->dobj.catId.oid = 0;
+				AssignDumpId(&cluster->dobj);
+				cluster->dobj.name = pg_strdup(index->dobj.name);
+				cluster->dobj.namespace = index->indextable->dobj.namespace;
+				cluster->index = index;
+				cluster->indextable = &tblinfo[i];
+
+				/* The CLUSTER ON depends on its index.. */
+				addObjectDependency(&cluster->dobj, index->dobj.dumpId);
+
+				ncluster++;
+			}
 		}
 
 		PQclear(res);
@@ -10221,6 +10248,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (SubscriptionInfo *) dobj);
 			break;
+		case DO_INDEX_CLUSTER_ON:
+			dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -16408,6 +16438,41 @@ getAttrName(int attrnum, TableInfo *tblInfo)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * dumpIndexClusterOn
+ *	  record that the index is clustered.
+ */
+static void
+dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	TableInfo	*tbinfo = clusterinfo->indextable;
+	char		*qindxname;
+	PQExpBuffer	q;
+
+	if (dopt->dataOnly)
+		return;
+
+	q = createPQExpBuffer();
+	qindxname = pg_strdup(fmtId(clusterinfo->dobj.name));
+
+	/* index name is not qualified in this syntax */
+	appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n",
+					  fmtQualifiedDumpable(tbinfo), qindxname);
+
+	if (clusterinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+		ArchiveEntry(fout, clusterinfo->dobj.catId, clusterinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = clusterinfo->dobj.name,
+								  .namespace = tbinfo->dobj.namespace->dobj.name,
+								  .owner = tbinfo->rolname,
+								  .description = "INDEX CLUSTER ON",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = q->data));
+
+	destroyPQExpBuffer(q);
+	free(qindxname);
+}
+
 /*
  * dumpIndex
  *	  write out to fout a user-defined index
@@ -16462,16 +16527,6 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 		 * similar code in dumpConstraint!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  qindxname);
-		}
-
 		/*
 		 * If the index has any statistics on some of its columns, generate
 		 * the associated ALTER INDEX queries.
@@ -16790,16 +16845,6 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 		 * similar code in dumpIndex!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  fmtId(indxinfo->dobj.name));
-		}
-
 		/* If the index defines identity, we need to record that. */
 		if (indxinfo->indisreplident)
 		{
@@ -18304,6 +18349,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 				break;
 			case DO_INDEX:
 			case DO_INDEX_ATTACH:
+			case DO_INDEX_CLUSTER_ON:
 			case DO_STATSEXT:
 			case DO_REFRESH_MATVIEW:
 			case DO_TRIGGER:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 317bb83970..aa51efbe2c 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -53,6 +53,7 @@ typedef enum
 	DO_ATTRDEF,
 	DO_INDEX,
 	DO_INDEX_ATTACH,
+	DO_INDEX_CLUSTER_ON,
 	DO_STATSEXT,
 	DO_RULE,
 	DO_TRIGGER,
@@ -377,6 +378,13 @@ typedef struct _indxInfo
 	DumpId		indexconstraint;
 } IndxInfo;
 
+typedef struct _indexClusterInfo
+{
+	DumpableObject dobj;
+	TableInfo  *indextable;		/* link to table the index is for */
+	IndxInfo   *index;		/* link to index itself */
+} IndexClusterInfo;
+
 typedef struct _indexAttachInfo
 {
 	DumpableObject dobj;
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 654e2ec514..5f0206bbf4 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -57,11 +57,12 @@ static const int dbObjectTypePriority[] =
 	20,							/* DO_ATTRDEF */
 	28,							/* DO_INDEX */
 	29,							/* DO_INDEX_ATTACH */
-	30,							/* DO_STATSEXT */
-	31,							/* DO_RULE */
-	32,							/* DO_TRIGGER */
+	30,							/* DO_INDEX_CLUSTER_ON */
+	31,							/* DO_STATSEXT */
+	32,							/* DO_RULE */
+	33,							/* DO_TRIGGER */
 	27,							/* DO_CONSTRAINT */
-	33,							/* DO_FK_CONSTRAINT */
+	34,							/* DO_FK_CONSTRAINT */
 	2,							/* DO_PROCLANG */
 	10,							/* DO_CAST */
 	23,							/* DO_TABLE_DATA */
@@ -73,18 +74,18 @@ static const int dbObjectTypePriority[] =
 	15,							/* DO_TSCONFIG */
 	16,							/* DO_FDW */
 	17,							/* DO_FOREIGN_SERVER */
-	38,							/* DO_DEFAULT_ACL --- done in ACL pass */
+	39,							/* DO_DEFAULT_ACL --- done in ACL pass */
 	3,							/* DO_TRANSFORM */
 	21,							/* DO_BLOB */
 	25,							/* DO_BLOB_DATA */
 	22,							/* DO_PRE_DATA_BOUNDARY */
 	26,							/* DO_POST_DATA_BOUNDARY */
-	39,							/* DO_EVENT_TRIGGER --- next to last! */
-	40,							/* DO_REFRESH_MATVIEW --- last! */
-	34,							/* DO_POLICY */
-	35,							/* DO_PUBLICATION */
-	36,							/* DO_PUBLICATION_REL */
-	37							/* DO_SUBSCRIPTION */
+	40,							/* DO_EVENT_TRIGGER --- next to last! */
+	41,							/* DO_REFRESH_MATVIEW --- last! */
+	35,							/* DO_POLICY */
+	36,							/* DO_PUBLICATION */
+	37,							/* DO_PUBLICATION_REL */
+	38							/* DO_SUBSCRIPTION */
 };
 
 StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
@@ -1292,6 +1293,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "INDEX ATTACH %s  (ID %d)",
 					 obj->name, obj->dumpId);
 			return;
+		case DO_INDEX_CLUSTER_ON:
+			snprintf(buf, bufsize,
+					 "INDEX CLUSTER ON %s  (ID %d)",
+					 obj->name, obj->dumpId);
+			return;
 		case DO_STATSEXT:
 			snprintf(buf, bufsize,
 					 "STATISTICS %s  (ID %d OID %u)",
-- 
2.17.0

Justin Pryzby

pryzby@telsasoft.com

almost 5 years ago

In reply to: Justin Pryzby (#4)

7 attachment(s)

Re: CLUSTER on partitioned index

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This patch is intertwined with the tablespace patch: not only will it get
rebase conflict, but will also need to test the functionality of
CLUSTER (TABLESPACE a) partitioned_table;

--
Justin

Attachments:

v6-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload

From cb817c479d2a2d4ae94cfa8d215e369b5d206738 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 26 Nov 2020 14:37:08 -0600
Subject: [PATCH v6 1/7] pg_dump: make CLUSTER ON a separate dump object..

..since it needs to be restored after any child indexes are restored *and
attached*.  The order needs to be:

1) restore child and parent index (order doesn't matter);
2) attach child index;
3) set cluster on child and parent index (order doesn't matter);
---
 src/bin/pg_dump/pg_dump.c      | 86 ++++++++++++++++++++++++++--------
 src/bin/pg_dump/pg_dump.h      |  8 ++++
 src/bin/pg_dump/pg_dump_sort.c |  8 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 798d14580e..e6526392e5 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, TableInfo *tbinfo);
 static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
 static void dumpIndex(Archive *fout, IndxInfo *indxinfo);
 static void dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo);
+static void dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo);
 static void dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo);
 static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo);
 static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo);
@@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				i_inddependcollversions;
 	int			ntups;
 
+	int	ncluster = 0;
+	IndexClusterInfo *clusterinfo;
+	clusterinfo = (IndexClusterInfo *)
+		pg_malloc0(numTables * sizeof(IndexClusterInfo));
+
 	for (i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &tblinfo[i];
@@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				/* Plain secondary index */
 				indxinfo[j].indexconstraint = 0;
 			}
+
+			/* Record each table's CLUSTERed index, if any */
+			if (indxinfo[j].indisclustered)
+			{
+				IndxInfo   *index = &indxinfo[j];
+				IndexClusterInfo *cluster = &clusterinfo[ncluster];
+
+				cluster->dobj.objType = DO_INDEX_CLUSTER_ON;
+				cluster->dobj.catId.tableoid = 0;
+				cluster->dobj.catId.oid = 0;
+				AssignDumpId(&cluster->dobj);
+				cluster->dobj.name = pg_strdup(index->dobj.name);
+				cluster->dobj.namespace = index->indextable->dobj.namespace;
+				cluster->index = index;
+				cluster->indextable = &tblinfo[i];
+
+				/* The CLUSTER ON depends on its index.. */
+				addObjectDependency(&cluster->dobj, index->dobj.dumpId);
+
+				ncluster++;
+			}
 		}
 
 		PQclear(res);
@@ -10296,6 +10323,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (SubscriptionInfo *) dobj);
 			break;
+		case DO_INDEX_CLUSTER_ON:
+			dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -16516,6 +16546,41 @@ getAttrName(int attrnum, TableInfo *tblInfo)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * dumpIndexClusterOn
+ *	  record that the index is clustered.
+ */
+static void
+dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	TableInfo	*tbinfo = clusterinfo->indextable;
+	char		*qindxname;
+	PQExpBuffer	q;
+
+	if (dopt->dataOnly)
+		return;
+
+	q = createPQExpBuffer();
+	qindxname = pg_strdup(fmtId(clusterinfo->dobj.name));
+
+	/* index name is not qualified in this syntax */
+	appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n",
+					  fmtQualifiedDumpable(tbinfo), qindxname);
+
+	if (clusterinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+		ArchiveEntry(fout, clusterinfo->dobj.catId, clusterinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = clusterinfo->dobj.name,
+								  .namespace = tbinfo->dobj.namespace->dobj.name,
+								  .owner = tbinfo->rolname,
+								  .description = "INDEX CLUSTER ON",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = q->data));
+
+	destroyPQExpBuffer(q);
+	free(qindxname);
+}
+
 /*
  * dumpIndex
  *	  write out to fout a user-defined index
@@ -16570,16 +16635,6 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 		 * similar code in dumpConstraint!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  qindxname);
-		}
-
 		/*
 		 * If the index has any statistics on some of its columns, generate
 		 * the associated ALTER INDEX queries.
@@ -16906,16 +16961,6 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 		 * similar code in dumpIndex!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  fmtId(indxinfo->dobj.name));
-		}
-
 		/* If the index defines identity, we need to record that. */
 		if (indxinfo->indisreplident)
 		{
@@ -18421,6 +18466,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 				break;
 			case DO_INDEX:
 			case DO_INDEX_ATTACH:
+			case DO_INDEX_CLUSTER_ON:
 			case DO_STATSEXT:
 			case DO_REFRESH_MATVIEW:
 			case DO_TRIGGER:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 1290f9659b..57def4c009 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -54,6 +54,7 @@ typedef enum
 	DO_ATTRDEF,
 	DO_INDEX,
 	DO_INDEX_ATTACH,
+	DO_INDEX_CLUSTER_ON,
 	DO_STATSEXT,
 	DO_RULE,
 	DO_TRIGGER,
@@ -386,6 +387,13 @@ typedef struct _indxInfo
 	DumpId		indexconstraint;
 } IndxInfo;
 
+typedef struct _indexClusterInfo
+{
+	DumpableObject dobj;
+	TableInfo  *indextable;		/* link to table the index is for */
+	IndxInfo   *index;		/* link to index itself */
+} IndexClusterInfo;
+
 typedef struct _indexAttachInfo
 {
 	DumpableObject dobj;
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 46461fb6a1..dd5b233196 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -75,6 +75,7 @@ enum dbObjectTypePriorities
 	PRIO_CONSTRAINT,
 	PRIO_INDEX,
 	PRIO_INDEX_ATTACH,
+	PRIO_INDEX_CLUSTER_ON,
 	PRIO_STATSEXT,
 	PRIO_RULE,
 	PRIO_TRIGGER,
@@ -108,6 +109,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_ATTRDEF,				/* DO_ATTRDEF */
 	PRIO_INDEX,					/* DO_INDEX */
 	PRIO_INDEX_ATTACH,			/* DO_INDEX_ATTACH */
+	PRIO_INDEX_CLUSTER_ON,			/* DO_INDEX_CLUSTER_ON */
 	PRIO_STATSEXT,				/* DO_STATSEXT */
 	PRIO_RULE,					/* DO_RULE */
 	PRIO_TRIGGER,				/* DO_TRIGGER */
@@ -136,6 +138,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+
 };
 
 StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
@@ -1348,6 +1351,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "INDEX ATTACH %s  (ID %d)",
 					 obj->name, obj->dumpId);
 			return;
+		case DO_INDEX_CLUSTER_ON:
+			snprintf(buf, bufsize,
+					 "INDEX CLUSTER ON %s  (ID %d)",
+					 obj->name, obj->dumpId);
+			return;
 		case DO_STATSEXT:
 			snprintf(buf, bufsize,
 					 "STATISTICS %s  (ID %d OID %u)",
-- 
2.17.0

v6-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From f5b22edbfbf80e057a5795ab00e92ce41ad6168c Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v6 2/7] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 172 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/commands/cluster.h        |   1 +
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 208 insertions(+), 54 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 5dd21a0189..fb5deddb35 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -192,6 +192,12 @@ CLUSTER [VERBOSE]
     are periodically reclustered.
    </para>
 
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    index partition of the given partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 096a06f7b3..838bcd9e72 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -135,7 +140,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -146,14 +151,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -191,8 +188,32 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, params.options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -202,7 +223,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -225,28 +245,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -352,9 +351,10 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -398,8 +398,13 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -415,6 +420,14 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -483,6 +496,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -507,12 +523,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -584,10 +594,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1582,3 +1588,77 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels are also processed by cluster_rel, to
+		 * call check_index_is_clusterable() and mark_index_clustered().
+		 */
+
+		/*
+		 * We have to build the list in a different memory context so it will
+		 * survive the cross-transaction processing
+		 */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+
+		MemoryContextSwitchTo(old_context);
+	}
+
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+		ClusterParams cluster_params = { .options = options, };
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_params.options |= CLUOPT_RECHECK;
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					&cluster_params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 6abcbea963..3a6a4831e1 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index a941f2accd..c30ca01726 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -22,6 +22,7 @@
 /* flag bits for ClusterParams->flags */
 #define CLUOPT_RECHECK 0x01		/* recheck relation state */
 #define CLUOPT_VERBOSE 0x02		/* print progress info */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04	/* recheck relation state for indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..e4448350e7 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+   relname   | relkind | ?column? 
+-------------+---------+----------
+ clstrpart   | p       | t
+ clstrpart1  | p       | t
+ clstrpart11 | r       | f
+ clstrpart12 | p       | t
+ clstrpart2  | r       | f
+ clstrpart3  | p       | t
+ clstrpart33 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+    indexrelid     | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..22225dc924 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY 1;
+-- Check that clustering sets new indisclustered:
+SELECT i.indexrelid::regclass::text, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY 1;
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v6-0003-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload

From d99d6b22da07f1c575403411ee8d042e8bae34f9 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v6 3/7] Propagate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 109 ++++++++++++++++----------
 src/backend/commands/indexcmds.c      |   2 +
 src/test/regress/expected/cluster.out |  46 +++++++++++
 src/test/regress/sql/cluster.sql      |  11 +++
 4 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 838bcd9e72..078b97fbb9 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -74,6 +74,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -510,66 +511,88 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered index as clustered, and parents of
+ * other indexes as unclustered.
+ * We wish to maintain the following properties:
+ * 1) Only one index on a relation can be marked clustered at once
+ * 2) If a partitioned index is clustered, then all its children must be
+ *     clustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
-
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
+	ListCell	*lc, *lc2;
+	List		*indexes;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel), ShareRowExclusiveLock, NULL);
 
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's children
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
-
-	foreach(index, RelationGetIndexList(rel))
+	foreach(lc, inh)
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
-
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		indexes = RelationGetIndexList(thisrel);
+		foreach (lc2, indexes)
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
+			bool	isclustered;
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			List	*parentoids = get_rel_relispartition(thisIndexOid) ?
+				get_partition_ancestors(thisIndexOid) : NIL;
 
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
+			/*
+			 * A child of the clustered index must be set clustered;
+			 * indexes which are not children of the clustered index are
+			 * set unclustered
+			 */
+			isclustered = (thisIndexOid == indexOid) ||
+					list_member_oid(parentoids, indexOid);
+			Assert(OidIsValid(indexOid) || !isclustered);
+			set_indisclustered(thisIndexOid, isclustered, pg_index);
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
 
-		heap_freetuple(indexTuple);
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
+	list_free(inh);
 
 	table_close(pg_index, RowExclusiveLock);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f9f3ff3b62..0a65698c28 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -25,6 +25,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e4448350e7..a9fb9f1021 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,52 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a)
+
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a)
+    "clstrpart1_idx_2" btree (a) CLUSTER
+Number of partitions: 2 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 22225dc924..d15bd51496 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,17 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v6-0004-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload

From 690be02168bce28b8f110946d4f04f449639f2a2 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 18:58:03 -0600
Subject: [PATCH v6 4/7] Invalidate parent indexes

---
 src/backend/commands/cluster.c        | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 26 ++++++++++++++++++++++++++
 src/test/regress/sql/cluster.sql      |  8 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 078b97fbb9..cdb49985ce 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -594,6 +594,27 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	}
 	list_free(inh);
 
+	/*
+	 * Set parent of all indexes as unclustered when a rel is unclustered; and,
+	 * when an index is clustered, set parents of all /other/ indexes as
+	 * unclustered.
+	 */
+	indexes = RelationGetIndexList(rel);
+	foreach (lc, indexes)
+	{
+		Oid	thisIndexOid = lfirst_oid(lc);
+
+		if (thisIndexOid == indexOid)
+			continue;
+
+		while (get_rel_relispartition(thisIndexOid))
+		{
+			thisIndexOid = get_partition_parent(thisIndexOid);
+			set_indisclustered(thisIndexOid, false, pg_index);
+		}
+	}
+	list_free(indexes);
+
 	table_close(pg_index, RowExclusiveLock);
 }
 
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index a9fb9f1021..6d88978387 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -541,6 +541,32 @@ Indexes:
     "clstrpart1_idx_2" btree (a) CLUSTER
 Number of partitions: 2 (Use \d+ to list them.)
 
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index d15bd51496..c9bb204a93 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -231,6 +231,14 @@ CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
 \d clstrpart1
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v6-0005-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload

From 811dbf5135da57cd0ee14f213f566b092a264c3a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 19:11:41 -0600
Subject: [PATCH v6 5/7] Invalidate parent index cluster on attach

---
 src/backend/commands/indexcmds.c      | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 14 ++++++++++++++
 src/test/regress/sql/cluster.sql      |  5 +++++
 3 files changed, 40 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 0a65698c28..ffc809822b 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4006,6 +4006,27 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/*
+	 * If the attached index is not clustered, invalidate cluster mark on
+	 * any parents
+	 */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 6d88978387..f0c962db75 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -567,6 +567,20 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 4 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index c9bb204a93..ff7ffed6e4 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -239,6 +239,11 @@ CLUSTER clstrpart1 USING clstrpart1_idx_2;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
 \d clstrpart
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v6-0006-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From 50edd08cf9893e2d29c849825b812a6706d4bc61 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v6 6/7] Preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  2 +-
 src/test/regress/expected/cluster.out | 12 ++++++++++++
 src/test/regress/sql/cluster.sql      |  4 ++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b8cd35e995..e15fb42c9c 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -603,7 +603,7 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) && get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index f0c962db75..fc642f87d2 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -581,6 +581,18 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 4 (Use \d+ to list them.)
 
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index ff7ffed6e4..5df338f60d 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -244,6 +244,10 @@ ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
 ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
 \d clstrpart
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v6-0007-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload

From b55a919ef90e82f5233b2a8f789c129b264615cb Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 25 Nov 2020 17:34:07 -0600
Subject: [PATCH v6 7/7] pg_dump: partitioned index depend on its partitions

This is required for restoring clustered parent index, which is marked INVALID
until indexes have been built on all its child tables, and it's prohibited to
CLUSTER ON an INVALID index

See also: 8cff4f5348d075e063100071013f00a900c32b0f
---
 src/backend/commands/tablecmds.c | 6 +++---
 src/bin/pg_dump/common.c         | 8 ++++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 8687e9a97c..865ab6c2e9 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17493,6 +17493,9 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		table_close(idxRel, RowExclusiveLock);
 	}
 
+	/* make sure we see the validation we just did */
+	CommandCounterIncrement();
+
 	/*
 	 * If this index is in turn a partition of a larger index, validating it
 	 * might cause the parent to become valid also.  Try that.
@@ -17504,9 +17507,6 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		Relation	parentIdx,
 					parentTbl;
 
-		/* make sure we see the validation we just did */
-		CommandCounterIncrement();
-
 		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
 		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index b0f02bc1f6..2a7d9b463c 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -429,6 +429,12 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			attachinfo[k].parentIdx = parentidx;
 			attachinfo[k].partitionIdx = index;
 
+			/*
+			 * We want dependencies from parent to partition (so that the
+			 * partition index is created first)
+			 */
+			addObjectDependency(&parentidx->dobj, index->dobj.dumpId);
+
 			/*
 			 * We must state the DO_INDEX_ATTACH object's dependencies
 			 * explicitly, since it will not match anything in pg_depend.
@@ -446,6 +452,8 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			 */
 			addObjectDependency(&attachinfo[k].dobj, index->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj, parentidx->dobj.dumpId);
+			// addObjectDependency(&parentidx->dobj, attachinfo[k].dobj.dumpId);
+
 			addObjectDependency(&attachinfo[k].dobj,
 								index->indextable->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj,
-- 
2.17.0

Justin Pryzby

pryzby@telsasoft.com

almost 5 years ago

In reply to: Justin Pryzby (#5)

7 attachment(s)

Re: CLUSTER on partitioned index

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

--
Justin

Attachments:

v7-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload

From 1101d6b8a44f5cc170816f305476750352dc06de Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 26 Nov 2020 14:37:08 -0600
Subject: [PATCH v7 1/7] pg_dump: make CLUSTER ON a separate dump object..

..since it needs to be restored after any child indexes are restored *and
attached*.  The order needs to be:

1) restore child and parent index (order doesn't matter);
2) attach child index;
3) set cluster on child and parent index (order doesn't matter);
---
 src/bin/pg_dump/pg_dump.c      | 86 ++++++++++++++++++++++++++--------
 src/bin/pg_dump/pg_dump.h      |  8 ++++
 src/bin/pg_dump/pg_dump_sort.c |  8 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index d99b61e621..8dc8a42964 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, TableInfo *tbinfo);
 static void dumpSequenceData(Archive *fout, TableDataInfo *tdinfo);
 static void dumpIndex(Archive *fout, IndxInfo *indxinfo);
 static void dumpIndexAttach(Archive *fout, IndexAttachInfo *attachinfo);
+static void dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo);
 static void dumpStatisticsExt(Archive *fout, StatsExtInfo *statsextinfo);
 static void dumpConstraint(Archive *fout, ConstraintInfo *coninfo);
 static void dumpTableConstraintComment(Archive *fout, ConstraintInfo *coninfo);
@@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				i_inddependcollversions;
 	int			ntups;
 
+	int	ncluster = 0;
+	IndexClusterInfo *clusterinfo;
+	clusterinfo = (IndexClusterInfo *)
+		pg_malloc0(numTables * sizeof(IndexClusterInfo));
+
 	for (i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &tblinfo[i];
@@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				/* Plain secondary index */
 				indxinfo[j].indexconstraint = 0;
 			}
+
+			/* Record each table's CLUSTERed index, if any */
+			if (indxinfo[j].indisclustered)
+			{
+				IndxInfo   *index = &indxinfo[j];
+				IndexClusterInfo *cluster = &clusterinfo[ncluster];
+
+				cluster->dobj.objType = DO_INDEX_CLUSTER_ON;
+				cluster->dobj.catId.tableoid = 0;
+				cluster->dobj.catId.oid = 0;
+				AssignDumpId(&cluster->dobj);
+				cluster->dobj.name = pg_strdup(index->dobj.name);
+				cluster->dobj.namespace = index->indextable->dobj.namespace;
+				cluster->index = index;
+				cluster->indextable = &tblinfo[i];
+
+				/* The CLUSTER ON depends on its index.. */
+				addObjectDependency(&cluster->dobj, index->dobj.dumpId);
+
+				ncluster++;
+			}
 		}
 
 		PQclear(res);
@@ -10323,6 +10350,9 @@ dumpDumpableObject(Archive *fout, DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (SubscriptionInfo *) dobj);
 			break;
+		case DO_INDEX_CLUSTER_ON:
+			dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -16543,6 +16573,41 @@ getAttrName(int attrnum, TableInfo *tblInfo)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * dumpIndexClusterOn
+ *	  record that the index is clustered.
+ */
+static void
+dumpIndexClusterOn(Archive *fout, IndexClusterInfo *clusterinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	TableInfo	*tbinfo = clusterinfo->indextable;
+	char		*qindxname;
+	PQExpBuffer	q;
+
+	if (dopt->dataOnly)
+		return;
+
+	q = createPQExpBuffer();
+	qindxname = pg_strdup(fmtId(clusterinfo->dobj.name));
+
+	/* index name is not qualified in this syntax */
+	appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n",
+					  fmtQualifiedDumpable(tbinfo), qindxname);
+
+	if (clusterinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+		ArchiveEntry(fout, clusterinfo->dobj.catId, clusterinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = clusterinfo->dobj.name,
+								  .namespace = tbinfo->dobj.namespace->dobj.name,
+								  .owner = tbinfo->rolname,
+								  .description = "INDEX CLUSTER ON",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = q->data));
+
+	destroyPQExpBuffer(q);
+	free(qindxname);
+}
+
 /*
  * dumpIndex
  *	  write out to fout a user-defined index
@@ -16597,16 +16662,6 @@ dumpIndex(Archive *fout, IndxInfo *indxinfo)
 		 * similar code in dumpConstraint!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  qindxname);
-		}
-
 		/*
 		 * If the index has any statistics on some of its columns, generate
 		 * the associated ALTER INDEX queries.
@@ -16933,16 +16988,6 @@ dumpConstraint(Archive *fout, ConstraintInfo *coninfo)
 		 * similar code in dumpIndex!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  fmtId(indxinfo->dobj.name));
-		}
-
 		/* If the index defines identity, we need to record that. */
 		if (indxinfo->indisreplident)
 		{
@@ -18448,6 +18493,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 				break;
 			case DO_INDEX:
 			case DO_INDEX_ATTACH:
+			case DO_INDEX_CLUSTER_ON:
 			case DO_STATSEXT:
 			case DO_REFRESH_MATVIEW:
 			case DO_TRIGGER:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 1290f9659b..57def4c009 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -54,6 +54,7 @@ typedef enum
 	DO_ATTRDEF,
 	DO_INDEX,
 	DO_INDEX_ATTACH,
+	DO_INDEX_CLUSTER_ON,
 	DO_STATSEXT,
 	DO_RULE,
 	DO_TRIGGER,
@@ -386,6 +387,13 @@ typedef struct _indxInfo
 	DumpId		indexconstraint;
 } IndxInfo;
 
+typedef struct _indexClusterInfo
+{
+	DumpableObject dobj;
+	TableInfo  *indextable;		/* link to table the index is for */
+	IndxInfo   *index;		/* link to index itself */
+} IndexClusterInfo;
+
 typedef struct _indexAttachInfo
 {
 	DumpableObject dobj;
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 46461fb6a1..dd5b233196 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -75,6 +75,7 @@ enum dbObjectTypePriorities
 	PRIO_CONSTRAINT,
 	PRIO_INDEX,
 	PRIO_INDEX_ATTACH,
+	PRIO_INDEX_CLUSTER_ON,
 	PRIO_STATSEXT,
 	PRIO_RULE,
 	PRIO_TRIGGER,
@@ -108,6 +109,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_ATTRDEF,				/* DO_ATTRDEF */
 	PRIO_INDEX,					/* DO_INDEX */
 	PRIO_INDEX_ATTACH,			/* DO_INDEX_ATTACH */
+	PRIO_INDEX_CLUSTER_ON,			/* DO_INDEX_CLUSTER_ON */
 	PRIO_STATSEXT,				/* DO_STATSEXT */
 	PRIO_RULE,					/* DO_RULE */
 	PRIO_TRIGGER,				/* DO_TRIGGER */
@@ -136,6 +138,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+
 };
 
 StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
@@ -1348,6 +1351,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "INDEX ATTACH %s  (ID %d)",
 					 obj->name, obj->dumpId);
 			return;
+		case DO_INDEX_CLUSTER_ON:
+			snprintf(buf, bufsize,
+					 "INDEX CLUSTER ON %s  (ID %d)",
+					 obj->name, obj->dumpId);
+			return;
 		case DO_STATSEXT:
 			snprintf(buf, bufsize,
 					 "STATISTICS %s  (ID %d OID %u)",
-- 
2.17.0

v7-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From 0004939e44906d7b32f5f6239109cc25ac72e301 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v7 2/7] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 174 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/commands/cluster.h        |   1 +
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 209 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 5dd21a0189..fb5deddb35 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -192,6 +192,12 @@ CLUSTER [VERBOSE]
     are periodically reclustered.
    </para>
 
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    index partition of the given partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 096a06f7b3..9b6673867c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -135,7 +140,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -146,14 +151,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -189,10 +186,34 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		}
 
 		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		table_close(rel, ShareUpdateExclusiveLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, params.options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -202,7 +223,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -225,28 +245,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -352,9 +351,10 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -398,8 +398,13 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -415,6 +420,14 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -483,6 +496,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -507,12 +523,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -584,10 +594,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1582,3 +1588,77 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels are also processed by cluster_rel, to
+		 * call check_index_is_clusterable() and mark_index_clustered().
+		 */
+
+		/*
+		 * We have to build the list in a different memory context so it will
+		 * survive the cross-transaction processing
+		 */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+
+		MemoryContextSwitchTo(old_context);
+	}
+
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+		ClusterParams cluster_params = { .options = options, };
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_params.options |= CLUOPT_RECHECK;
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					&cluster_params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a75647b1cc..781aa95abc 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index a941f2accd..c30ca01726 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -22,6 +22,7 @@
 /* flag bits for ClusterParams->flags */
 #define CLUOPT_RECHECK 0x01		/* recheck relation state */
 #define CLUOPT_VERBOSE 0x02		/* print progress info */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04	/* recheck relation state for indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..c74cfa88cc 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+      relname      | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..9bcc77695c 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v7-0003-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload

From 4995e70a192989ecda4d1b3798c0d00ba47f06c4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v7 3/7] Propagate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 109 ++++++++++++++++----------
 src/backend/commands/indexcmds.c      |   2 +
 src/test/regress/expected/cluster.out |  46 +++++++++++
 src/test/regress/sql/cluster.sql      |  11 +++
 4 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 9b6673867c..78c2c2ba72 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -74,6 +74,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -510,66 +511,88 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered index as clustered, and parents of
+ * other indexes as unclustered.
+ * We wish to maintain the following properties:
+ * 1) Only one index on a relation can be marked clustered at once
+ * 2) If a partitioned index is clustered, then all its children must be
+ *     clustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
-
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
+	ListCell	*lc, *lc2;
+	List		*indexes;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel), ShareRowExclusiveLock, NULL);
 
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's children
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
-
-	foreach(index, RelationGetIndexList(rel))
+	foreach(lc, inh)
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
-
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		indexes = RelationGetIndexList(thisrel);
+		foreach (lc2, indexes)
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
+			bool	isclustered;
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			List	*parentoids = get_rel_relispartition(thisIndexOid) ?
+				get_partition_ancestors(thisIndexOid) : NIL;
 
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
+			/*
+			 * A child of the clustered index must be set clustered;
+			 * indexes which are not children of the clustered index are
+			 * set unclustered
+			 */
+			isclustered = (thisIndexOid == indexOid) ||
+					list_member_oid(parentoids, indexOid);
+			Assert(OidIsValid(indexOid) || !isclustered);
+			set_indisclustered(thisIndexOid, isclustered, pg_index);
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
 
-		heap_freetuple(indexTuple);
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
+	list_free(inh);
 
 	table_close(pg_index, RowExclusiveLock);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 127ba7835d..4ca1ffbfa4 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -25,6 +25,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index c74cfa88cc..1d436dfaae 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,52 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a)
+
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a)
+    "clstrpart1_idx_2" btree (a) CLUSTER
+Number of partitions: 2 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 9bcc77695c..0ded2be1ca 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,17 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v7-0004-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload

From c8afb42bc44ab8cc6349bd53614bb6da72da7afa Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 18:58:03 -0600
Subject: [PATCH v7 4/7] Invalidate parent indexes

---
 src/backend/commands/cluster.c        | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 26 ++++++++++++++++++++++++++
 src/test/regress/sql/cluster.sql      |  8 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 78c2c2ba72..05c84e5582 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -594,6 +594,27 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	}
 	list_free(inh);
 
+	/*
+	 * Set parent of all indexes as unclustered when a rel is unclustered; and,
+	 * when an index is clustered, set parents of all /other/ indexes as
+	 * unclustered.
+	 */
+	indexes = RelationGetIndexList(rel);
+	foreach (lc, indexes)
+	{
+		Oid	thisIndexOid = lfirst_oid(lc);
+
+		if (thisIndexOid == indexOid)
+			continue;
+
+		while (get_rel_relispartition(thisIndexOid))
+		{
+			thisIndexOid = get_partition_parent(thisIndexOid);
+			set_indisclustered(thisIndexOid, false, pg_index);
+		}
+	}
+	list_free(indexes);
+
 	table_close(pg_index, RowExclusiveLock);
 }
 
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 1d436dfaae..6cba3cc4f9 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -541,6 +541,32 @@ Indexes:
     "clstrpart1_idx_2" btree (a) CLUSTER
 Number of partitions: 2 (Use \d+ to list them.)
 
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 0ded2be1ca..a1d132f288 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -231,6 +231,14 @@ CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
 \d clstrpart1
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v7-0005-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload

From e92499f78950405b1b8f92c6240909335bdebc17 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 19:11:41 -0600
Subject: [PATCH v7 5/7] Invalidate parent index cluster on attach

---
 src/backend/commands/indexcmds.c      | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 14 ++++++++++++++
 src/test/regress/sql/cluster.sql      |  5 +++++
 3 files changed, 40 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 4ca1ffbfa4..cc57c149ed 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4117,6 +4117,27 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/*
+	 * If the attached index is not clustered, invalidate cluster mark on
+	 * any parents
+	 */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 6cba3cc4f9..e7f0889743 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -567,6 +567,20 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 4 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index a1d132f288..3c8085c69e 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -239,6 +239,11 @@ CLUSTER clstrpart1 USING clstrpart1_idx_2;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
 \d clstrpart
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v7-0006-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From e91b525beafbc93223807c68d557b95f7dce3e94 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v7 6/7] Preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  2 +-
 src/test/regress/expected/cluster.out | 12 ++++++++++++
 src/test/regress/sql/cluster.sql      |  4 ++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1cb9172a5f..3611da45e6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -604,7 +604,7 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) && get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e7f0889743..a3943c13f5 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -581,6 +581,18 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 4 (Use \d+ to list them.)
 
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 3c8085c69e..22628e90ca 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -244,6 +244,10 @@ ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
 ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
 \d clstrpart
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v7-0007-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload

From 9ac8ebd70cc50132f16272df3eae7878b69d6082 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 25 Nov 2020 17:34:07 -0600
Subject: [PATCH v7 7/7] pg_dump: partitioned index depend on its partitions

This is required for restoring clustered parent index, which is marked INVALID
until indexes have been built on all its child tables, and it's prohibited to
CLUSTER ON an INVALID index

See also: 8cff4f5348d075e063100071013f00a900c32b0f
---
 src/backend/commands/tablecmds.c | 6 +++---
 src/bin/pg_dump/common.c         | 8 ++++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 420991e315..a478c13990 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17521,6 +17521,9 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		table_close(idxRel, RowExclusiveLock);
 	}
 
+	/* make sure we see the validation we just did */
+	CommandCounterIncrement();
+
 	/*
 	 * If this index is in turn a partition of a larger index, validating it
 	 * might cause the parent to become valid also.  Try that.
@@ -17532,9 +17535,6 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		Relation	parentIdx,
 					parentTbl;
 
-		/* make sure we see the validation we just did */
-		CommandCounterIncrement();
-
 		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
 		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a261a5545..cdfba058fc 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -429,6 +429,12 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			attachinfo[k].parentIdx = parentidx;
 			attachinfo[k].partitionIdx = index;
 
+			/*
+			 * We want dependencies from parent to partition (so that the
+			 * partition index is created first)
+			 */
+			addObjectDependency(&parentidx->dobj, index->dobj.dumpId);
+
 			/*
 			 * We must state the DO_INDEX_ATTACH object's dependencies
 			 * explicitly, since it will not match anything in pg_depend.
@@ -446,6 +452,8 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			 */
 			addObjectDependency(&attachinfo[k].dobj, index->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj, parentidx->dobj.dumpId);
+			// addObjectDependency(&parentidx->dobj, attachinfo[k].dobj.dumpId);
+
 			addObjectDependency(&attachinfo[k].dobj,
 								index->indextable->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj,
-- 
2.17.0

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Justin Pryzby (#6)

Re: CLUSTER on partitioned index

Hi,
For v7-0002-Implement-CLUSTER-of-partitioned-table.patch:

+        * We have to build the list in a different memory context so it
will
+        * survive the cross-transaction processing
+        */
+       old_context = MemoryContextSwitchTo(cluster_context);

cluster_context is not modified within the loop. Can the memory context
switching code be moved outside the loop ?

Cheers

On Sat, Feb 6, 2021 at 6:46 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

Show quoted text

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which

preserves

indisclustered on children of clustered, partitioned indexes,

and invalidates

indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new

while we

discuss it.

This fixes some omissions in the previous patch and error in its

test cases.

CLUSTER ON recurses to children, since I think a clustered parent

index means

that all its child indexes are clustered. "SET WITHOUT CLUSTER"

doesn't have

to recurse to children, but I did it like that for consistency and

it avoids

the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered,

parent index,

since it's marked INVALID until indexes have been built on all child

tables, so

CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow

attaching

the child index (thereby making the parent "valid") to happen before

SET

CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

--
Justin

Justin Pryzby

pryzby@telsasoft.com

almost 5 years ago

In reply to: Justin Pryzby (#6)

8 attachment(s)

Re: CLUSTER on partitioned index

On Sat, Feb 06, 2021 at 08:45:49AM -0600, Justin Pryzby wrote:

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

It occured to me that progress reporting should expose this.

I did this in the style of pg_stat_progress_create_index, adding columns
partitions_total and partitions_done showing the overall progress. The progress
of individual partitions is also visible in {blocks,tuples}_{done,total}.
This seems odd, but that's how the index view behaves.

--
Justin

Attachments:

v8-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload

From 1614cfa411dabd74faa64dc805f52405e2572239 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 26 Nov 2020 14:37:08 -0600
Subject: [PATCH v8 1/8] pg_dump: make CLUSTER ON a separate dump object..

..since it needs to be restored after any child indexes are restored *and
attached*.  The order needs to be:

1) restore child and parent index (order doesn't matter);
2) attach child index;
3) set cluster on child and parent index (order doesn't matter);
---
 src/bin/pg_dump/pg_dump.c      | 86 ++++++++++++++++++++++++++--------
 src/bin/pg_dump/pg_dump.h      |  8 ++++
 src/bin/pg_dump/pg_dump_sort.c |  8 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index eb988d7eb4..e93d2eb828 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, const TableInfo *tbinfo);
 static void dumpSequenceData(Archive *fout, const TableDataInfo *tdinfo);
 static void dumpIndex(Archive *fout, const IndxInfo *indxinfo);
 static void dumpIndexAttach(Archive *fout, const IndexAttachInfo *attachinfo);
+static void dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo);
 static void dumpStatisticsExt(Archive *fout, const StatsExtInfo *statsextinfo);
 static void dumpConstraint(Archive *fout, const ConstraintInfo *coninfo);
 static void dumpTableConstraintComment(Archive *fout, const ConstraintInfo *coninfo);
@@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				i_inddependcollversions;
 	int			ntups;
 
+	int	ncluster = 0;
+	IndexClusterInfo *clusterinfo;
+	clusterinfo = (IndexClusterInfo *)
+		pg_malloc0(numTables * sizeof(IndexClusterInfo));
+
 	for (i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &tblinfo[i];
@@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				/* Plain secondary index */
 				indxinfo[j].indexconstraint = 0;
 			}
+
+			/* Record each table's CLUSTERed index, if any */
+			if (indxinfo[j].indisclustered)
+			{
+				IndxInfo   *index = &indxinfo[j];
+				IndexClusterInfo *cluster = &clusterinfo[ncluster];
+
+				cluster->dobj.objType = DO_INDEX_CLUSTER_ON;
+				cluster->dobj.catId.tableoid = 0;
+				cluster->dobj.catId.oid = 0;
+				AssignDumpId(&cluster->dobj);
+				cluster->dobj.name = pg_strdup(index->dobj.name);
+				cluster->dobj.namespace = index->indextable->dobj.namespace;
+				cluster->index = index;
+				cluster->indextable = &tblinfo[i];
+
+				/* The CLUSTER ON depends on its index.. */
+				addObjectDependency(&cluster->dobj, index->dobj.dumpId);
+
+				ncluster++;
+			}
 		}
 
 		PQclear(res);
@@ -10323,6 +10350,9 @@ dumpDumpableObject(Archive *fout, const DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_INDEX_CLUSTER_ON:
+			dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -16543,6 +16573,41 @@ getAttrName(int attrnum, const TableInfo *tblInfo)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * dumpIndexClusterOn
+ *	  record that the index is clustered.
+ */
+static void
+dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	TableInfo	*tbinfo = clusterinfo->indextable;
+	char		*qindxname;
+	PQExpBuffer	q;
+
+	if (dopt->dataOnly)
+		return;
+
+	q = createPQExpBuffer();
+	qindxname = pg_strdup(fmtId(clusterinfo->dobj.name));
+
+	/* index name is not qualified in this syntax */
+	appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n",
+					  fmtQualifiedDumpable(tbinfo), qindxname);
+
+	if (clusterinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+		ArchiveEntry(fout, clusterinfo->dobj.catId, clusterinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = clusterinfo->dobj.name,
+								  .namespace = tbinfo->dobj.namespace->dobj.name,
+								  .owner = tbinfo->rolname,
+								  .description = "INDEX CLUSTER ON",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = q->data));
+
+	destroyPQExpBuffer(q);
+	free(qindxname);
+}
+
 /*
  * dumpIndex
  *	  write out to fout a user-defined index
@@ -16597,16 +16662,6 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo)
 		 * similar code in dumpConstraint!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  qindxname);
-		}
-
 		/*
 		 * If the index has any statistics on some of its columns, generate
 		 * the associated ALTER INDEX queries.
@@ -16933,16 +16988,6 @@ dumpConstraint(Archive *fout, const ConstraintInfo *coninfo)
 		 * similar code in dumpIndex!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  fmtId(indxinfo->dobj.name));
-		}
-
 		/* If the index defines identity, we need to record that. */
 		if (indxinfo->indisreplident)
 		{
@@ -18448,6 +18493,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 				break;
 			case DO_INDEX:
 			case DO_INDEX_ATTACH:
+			case DO_INDEX_CLUSTER_ON:
 			case DO_STATSEXT:
 			case DO_REFRESH_MATVIEW:
 			case DO_TRIGGER:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 0a2213fb06..627c8fbdab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -54,6 +54,7 @@ typedef enum
 	DO_ATTRDEF,
 	DO_INDEX,
 	DO_INDEX_ATTACH,
+	DO_INDEX_CLUSTER_ON,
 	DO_STATSEXT,
 	DO_RULE,
 	DO_TRIGGER,
@@ -386,6 +387,13 @@ typedef struct _indxInfo
 	DumpId		indexconstraint;
 } IndxInfo;
 
+typedef struct _indexClusterInfo
+{
+	DumpableObject dobj;
+	TableInfo  *indextable;		/* link to table the index is for */
+	IndxInfo   *index;		/* link to index itself */
+} IndexClusterInfo;
+
 typedef struct _indexAttachInfo
 {
 	DumpableObject dobj;
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 46461fb6a1..dd5b233196 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -75,6 +75,7 @@ enum dbObjectTypePriorities
 	PRIO_CONSTRAINT,
 	PRIO_INDEX,
 	PRIO_INDEX_ATTACH,
+	PRIO_INDEX_CLUSTER_ON,
 	PRIO_STATSEXT,
 	PRIO_RULE,
 	PRIO_TRIGGER,
@@ -108,6 +109,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_ATTRDEF,				/* DO_ATTRDEF */
 	PRIO_INDEX,					/* DO_INDEX */
 	PRIO_INDEX_ATTACH,			/* DO_INDEX_ATTACH */
+	PRIO_INDEX_CLUSTER_ON,			/* DO_INDEX_CLUSTER_ON */
 	PRIO_STATSEXT,				/* DO_STATSEXT */
 	PRIO_RULE,					/* DO_RULE */
 	PRIO_TRIGGER,				/* DO_TRIGGER */
@@ -136,6 +138,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+
 };
 
 StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
@@ -1348,6 +1351,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "INDEX ATTACH %s  (ID %d)",
 					 obj->name, obj->dumpId);
 			return;
+		case DO_INDEX_CLUSTER_ON:
+			snprintf(buf, bufsize,
+					 "INDEX CLUSTER ON %s  (ID %d)",
+					 obj->name, obj->dumpId);
+			return;
 		case DO_STATSEXT:
 			snprintf(buf, bufsize,
 					 "STATISTICS %s  (ID %d OID %u)",
-- 
2.17.0

v8-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From 15c082c0b3b987386eaab8785403ef5ca9c09fba Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v8 2/8] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 174 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/commands/cluster.h        |   1 +
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 209 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 5dd21a0189..fb5deddb35 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -192,6 +192,12 @@ CLUSTER [VERBOSE]
     are periodically reclustered.
    </para>
 
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    index partition of the given partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 096a06f7b3..9b6673867c 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -135,7 +140,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -146,14 +151,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -189,10 +186,34 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		}
 
 		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		table_close(rel, ShareUpdateExclusiveLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, params.options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -202,7 +223,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -225,28 +245,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -352,9 +351,10 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -398,8 +398,13 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -415,6 +420,14 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -483,6 +496,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -507,12 +523,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -584,10 +594,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1582,3 +1588,77 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels are also processed by cluster_rel, to
+		 * call check_index_is_clusterable() and mark_index_clustered().
+		 */
+
+		/*
+		 * We have to build the list in a different memory context so it will
+		 * survive the cross-transaction processing
+		 */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+
+		MemoryContextSwitchTo(old_context);
+	}
+
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+		ClusterParams cluster_params = { .options = options, };
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_params.options |= CLUOPT_RECHECK;
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					&cluster_params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 1e1c315bae..ba301a6b08 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index a941f2accd..c30ca01726 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -22,6 +22,7 @@
 /* flag bits for ClusterParams->flags */
 #define CLUOPT_RECHECK 0x01		/* recheck relation state */
 #define CLUOPT_VERBOSE 0x02		/* print progress info */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04	/* recheck relation state for indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..c74cfa88cc 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+      relname      | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..9bcc77695c 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v8-0003-f-progress-reporting.patchtext/x-diff; charset=us-asciiDownload

From 2b0b1d79ba0e0f54b348d77a1650d674b8dd8de6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Mon, 8 Feb 2021 20:45:26 -0600
Subject: [PATCH v8 3/8] f! progress reporting..

This follows the precedent of pg_stat_progress_create_index, which
simultaneously shows the progress of 1) the leaf partition, and 2) the overall
progress as partitions_done / partitions_total.

This also includes current_child_index_relid, but the create_index view
doesn't, so maybe this shouldn't, either.
---
 src/backend/catalog/system_views.sql |  4 +-
 src/backend/commands/cluster.c       | 64 +++++++++++++++++-----------
 src/include/commands/progress.h      |  3 ++
 src/test/regress/expected/rules.out  |  4 +-
 4 files changed, 47 insertions(+), 28 deletions(-)

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fa58afd9d7..9a3cb5f935 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1069,7 +1069,9 @@ CREATE VIEW pg_stat_progress_cluster AS
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS index_rebuild_count,
+        S.param18 AS partitions_total,
+        S.param19 AS partitions_done
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 9b6673867c..cb4fc350c6 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -77,7 +77,7 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
-static void cluster_multiple_rels(List *rvs, int options);
+static void cluster_multiple_rels(List *rvs, int options, bool ispartitioned);
 
 
 /*---------------------------------------------------------------------------
@@ -188,16 +188,28 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, ShareUpdateExclusiveLock);
 
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
 		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 		{
+			pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+					OidIsValid(indexOid) ? PROGRESS_CLUSTER_COMMAND_CLUSTER :
+					PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+
 			/* Do the job. */
 			cluster_rel(tableOid, indexOid, &params);
+			pgstat_progress_end_command();
 		}
 		else
 		{
 			List	   *rvs;
 			MemoryContext cluster_context;
 
+			int64		progress_val[2];
+			const int	progress_index[] = {
+				PROGRESS_CLUSTER_COMMAND,
+				PROGRESS_CLUSTER_PARTITIONS_TOTAL,
+			};
+
 			/* Refuse to hold strong locks in a user transaction */
 			PreventInTransactionBlock(isTopLevel, "CLUSTER");
 
@@ -206,7 +218,20 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 												ALLOCSET_DEFAULT_SIZES);
 
 			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-			cluster_multiple_rels(rvs, params.options);
+
+			/* Report vacuum full or cluster and number of partitions */
+			progress_val[0] = OidIsValid(indexOid) ?
+				PROGRESS_CLUSTER_COMMAND_CLUSTER :
+				PROGRESS_CLUSTER_COMMAND_VACUUM_FULL;
+			progress_val[1] = list_length(rvs);
+// This is currently showing the total number of *tables*, including the
+// parent, and intermediate partitions, so the column should be renamed, or
+// we should count the number of leaf partitions
+
+			pgstat_progress_update_multi_param(2, progress_index, progress_val);
+
+			cluster_multiple_rels(rvs, params.options, true);
+			pgstat_progress_end_command();
 
 			/* Start a new transaction for the cleanup work. */
 			StartTransactionCommand();
@@ -245,7 +270,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED, false);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -282,14 +307,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
-
 	/*
 	 * We grab exclusive access to the target rel and index for the duration
 	 * of the transaction.  (This is redundant for the single-transaction
@@ -300,10 +317,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* If the table has gone away, we can skip processing it */
 	if (!OldHeap)
-	{
-		pgstat_progress_end_command();
 		return;
-	}
 
 	/*
 	 * Since we may open a new transaction for each relation, we have to check
@@ -319,7 +333,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		if (!pg_class_ownercheck(tableOid, GetUserId()))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
-			pgstat_progress_end_command();
 			return;
 		}
 
@@ -334,7 +347,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		if (RELATION_IS_OTHER_TEMP(OldHeap))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
-			pgstat_progress_end_command();
 			return;
 		}
 
@@ -346,7 +358,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
-				pgstat_progress_end_command();
 				return;
 			}
 
@@ -357,7 +368,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
-				pgstat_progress_end_command();
 				return;
 			}
 		}
@@ -416,7 +426,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		!RelationIsPopulated(OldHeap))
 	{
 		relation_close(OldHeap, AccessExclusiveLock);
-		pgstat_progress_end_command();
 		return;
 	}
 
@@ -424,7 +433,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
 	{
 		relation_close(OldHeap, AccessExclusiveLock);
-		pgstat_progress_end_command();
 		return;
 	}
 
@@ -440,8 +448,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	rebuild_relation(OldHeap, indexOid, verbose);
 
 	/* NB: rebuild_relation does table_close() on OldHeap */
-
-	pgstat_progress_end_command();
 }
 
 /*
@@ -1366,7 +1372,7 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	ReindexParams reindex_params = {0};
 	int			i;
 
-	/* Report that we are now swapping relation files */
+	/* Report that we are now swapping relation files XXX */
 	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 								 PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES);
 
@@ -1417,13 +1423,13 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
 		reindex_flags |= REINDEX_REL_FORCE_INDEXES_PERMANENT;
 
-	/* Report that we are now reindexing relations */
+	/* Report that we are now reindexing relations XXX */
 	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 								 PROGRESS_CLUSTER_PHASE_REBUILD_INDEX);
 
 	reindex_relation(OIDOldHeap, reindex_flags, &reindex_params);
 
-	/* Report that we are now doing clean up */
+	/* Report that we are now doing clean up XXX */
 	pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
 								 PROGRESS_CLUSTER_PHASE_FINAL_CLEANUP);
 
@@ -1633,9 +1639,10 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 
 /* Cluster each relation in a separate transaction */
 static void
-cluster_multiple_rels(List *rvs, int options)
+cluster_multiple_rels(List *rvs, int options, bool ispartitioned)
 {
 	ListCell *lc;
+	off_t	  childnum = 0;
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
@@ -1658,6 +1665,11 @@ cluster_multiple_rels(List *rvs, int options)
 		cluster_rel(rvtc->tableOid, rvtc->indexOid,
 					&cluster_params);
 
+		if (ispartitioned)
+			pgstat_progress_update_param(PROGRESS_CLUSTER_PARTITIONS_DONE,
+					childnum++);
+
+
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 95ec5d02e9..13499a3e72 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -60,6 +60,9 @@
 #define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
 #define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
 #define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/* Need to avoid the CREATE INDEX params */
+#define PROGRESS_CLUSTER_PARTITIONS_TOTAL	17
+#define PROGRESS_CLUSTER_PARTITIONS_DONE	18
 
 /* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
 #define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b632d9f2ea..d1fea532bc 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1941,7 +1941,9 @@ pg_stat_progress_cluster| SELECT s.pid,
     s.param5 AS heap_tuples_written,
     s.param6 AS heap_blks_total,
     s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS index_rebuild_count,
+    s.param18 AS partitions_total,
+    s.param19 AS partitions_done
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
-- 
2.17.0

v8-0004-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload

From 8cfd38f0f4752149a4ee7d40ec5866941968b17f Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v8 4/8] Propagate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 109 ++++++++++++++++----------
 src/backend/commands/indexcmds.c      |   2 +
 src/test/regress/expected/cluster.out |  46 +++++++++++
 src/test/regress/sql/cluster.sql      |  11 +++
 4 files changed, 125 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index cb4fc350c6..5c08f0642e 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -74,6 +74,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -516,66 +517,88 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered index as clustered, and parents of
+ * other indexes as unclustered.
+ * We wish to maintain the following properties:
+ * 1) Only one index on a relation can be marked clustered at once
+ * 2) If a partitioned index is clustered, then all its children must be
+ *     clustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
-
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
+	ListCell	*lc, *lc2;
+	List		*indexes;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel), ShareRowExclusiveLock, NULL);
 
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's children
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
-
-	foreach(index, RelationGetIndexList(rel))
+	foreach(lc, inh)
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
-
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		indexes = RelationGetIndexList(thisrel);
+		foreach (lc2, indexes)
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
+			bool	isclustered;
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			List	*parentoids = get_rel_relispartition(thisIndexOid) ?
+				get_partition_ancestors(thisIndexOid) : NIL;
 
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
+			/*
+			 * A child of the clustered index must be set clustered;
+			 * indexes which are not children of the clustered index are
+			 * set unclustered
+			 */
+			isclustered = (thisIndexOid == indexOid) ||
+					list_member_oid(parentoids, indexOid);
+			Assert(OidIsValid(indexOid) || !isclustered);
+			set_indisclustered(thisIndexOid, isclustered, pg_index);
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
 
-		heap_freetuple(indexTuple);
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
+	list_free(inh);
 
 	table_close(pg_index, RowExclusiveLock);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 127ba7835d..4ca1ffbfa4 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -25,6 +25,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index c74cfa88cc..1d436dfaae 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,52 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a)
+
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a)
+    "clstrpart1_idx_2" btree (a) CLUSTER
+Number of partitions: 2 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 9bcc77695c..0ded2be1ca 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,17 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v8-0005-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload

From f72ffaf24db124cd3b16cc2c9b689ed8729ec72c Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 18:58:03 -0600
Subject: [PATCH v8 5/8] Invalidate parent indexes

---
 src/backend/commands/cluster.c        | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 26 ++++++++++++++++++++++++++
 src/test/regress/sql/cluster.sql      |  8 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 5c08f0642e..60272bc010 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -600,6 +600,27 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	}
 	list_free(inh);
 
+	/*
+	 * Set parent of all indexes as unclustered when a rel is unclustered; and,
+	 * when an index is clustered, set parents of all /other/ indexes as
+	 * unclustered.
+	 */
+	indexes = RelationGetIndexList(rel);
+	foreach (lc, indexes)
+	{
+		Oid	thisIndexOid = lfirst_oid(lc);
+
+		if (thisIndexOid == indexOid)
+			continue;
+
+		while (get_rel_relispartition(thisIndexOid))
+		{
+			thisIndexOid = get_partition_parent(thisIndexOid);
+			set_indisclustered(thisIndexOid, false, pg_index);
+		}
+	}
+	list_free(indexes);
+
 	table_close(pg_index, RowExclusiveLock);
 }
 
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 1d436dfaae..6cba3cc4f9 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -541,6 +541,32 @@ Indexes:
     "clstrpart1_idx_2" btree (a) CLUSTER
 Number of partitions: 2 (Use \d+ to list them.)
 
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 0ded2be1ca..a1d132f288 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -231,6 +231,14 @@ CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
 \d clstrpart1
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v8-0006-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload

From b796243672e8a201ffe1e72d2f44c8bb78947900 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 19:11:41 -0600
Subject: [PATCH v8 6/8] Invalidate parent index cluster on attach

---
 src/backend/commands/indexcmds.c      | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 14 ++++++++++++++
 src/test/regress/sql/cluster.sql      |  5 +++++
 3 files changed, 40 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 4ca1ffbfa4..cc57c149ed 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4117,6 +4117,27 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/*
+	 * If the attached index is not clustered, invalidate cluster mark on
+	 * any parents
+	 */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 6cba3cc4f9..e7f0889743 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -567,6 +567,20 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 4 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index a1d132f288..3c8085c69e 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -239,6 +239,11 @@ CLUSTER clstrpart1 USING clstrpart1_idx_2;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
 \d clstrpart
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v8-0007-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From 4cd8925f74864daea41dc29672b712a04f25eb59 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v8 7/8] Preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  2 +-
 src/test/regress/expected/cluster.out | 12 ++++++++++++
 src/test/regress/sql/cluster.sql      |  4 ++++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1514937748..08fdf02ebc 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -604,7 +604,7 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) && get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e7f0889743..a3943c13f5 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -581,6 +581,18 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 4 (Use \d+ to list them.)
 
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 3c8085c69e..22628e90ca 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -244,6 +244,10 @@ ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
 ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
 \d clstrpart
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v8-0008-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload

From c30b5e94555c3193299210ed9f102c4633682d54 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 25 Nov 2020 17:34:07 -0600
Subject: [PATCH v8 8/8] pg_dump: partitioned index depend on its partitions

This is required for restoring clustered parent index, which is marked INVALID
until indexes have been built on all its child tables, and it's prohibited to
CLUSTER ON an INVALID index

See also: 8cff4f5348d075e063100071013f00a900c32b0f
---
 src/backend/commands/tablecmds.c | 6 +++---
 src/bin/pg_dump/common.c         | 8 ++++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 420991e315..a478c13990 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17521,6 +17521,9 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		table_close(idxRel, RowExclusiveLock);
 	}
 
+	/* make sure we see the validation we just did */
+	CommandCounterIncrement();
+
 	/*
 	 * If this index is in turn a partition of a larger index, validating it
 	 * might cause the parent to become valid also.  Try that.
@@ -17532,9 +17535,6 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		Relation	parentIdx,
 					parentTbl;
 
-		/* make sure we see the validation we just did */
-		CommandCounterIncrement();
-
 		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
 		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a261a5545..cdfba058fc 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -429,6 +429,12 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			attachinfo[k].parentIdx = parentidx;
 			attachinfo[k].partitionIdx = index;
 
+			/*
+			 * We want dependencies from parent to partition (so that the
+			 * partition index is created first)
+			 */
+			addObjectDependency(&parentidx->dobj, index->dobj.dumpId);
+
 			/*
 			 * We must state the DO_INDEX_ATTACH object's dependencies
 			 * explicitly, since it will not match anything in pg_depend.
@@ -446,6 +452,8 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			 */
 			addObjectDependency(&attachinfo[k].dobj, index->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj, parentidx->dobj.dumpId);
+			// addObjectDependency(&parentidx->dobj, attachinfo[k].dobj.dumpId);
+
 			addObjectDependency(&attachinfo[k].dobj,
 								index->indextable->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj,
-- 
2.17.0

Justin Pryzby

pryzby@telsasoft.com

almost 5 years ago

In reply to: Justin Pryzby (#8)

8 attachment(s)

Re: CLUSTER on partitioned index

On Wed, Feb 10, 2021 at 02:04:58PM -0600, Justin Pryzby wrote:

On Sat, Feb 06, 2021 at 08:45:49AM -0600, Justin Pryzby wrote:

On Mon, Jan 18, 2021 at 12:34:59PM -0600, Justin Pryzby wrote:

On Sat, Nov 28, 2020 at 08:03:02PM -0600, Justin Pryzby wrote:

On Sun, Nov 15, 2020 at 07:53:35PM -0600, Justin Pryzby wrote:

On Wed, Nov 04, 2020 at 08:23:56PM -0600, Justin Pryzby wrote:

On Tue, Oct 27, 2020 at 07:33:12PM -0500, Justin Pryzby wrote:

I'm attaching a counter-proposal to your catalog change, which preserves
indisclustered on children of clustered, partitioned indexes, and invalidates
indisclustered when attaching unclustered indexes.

..and now propagates CLUSTER ON to child indexes.

I left this as separate patches to show what I mean and what's new while we
discuss it.

This fixes some omissions in the previous patch and error in its test cases.

CLUSTER ON recurses to children, since I think a clustered parent index means
that all its child indexes are clustered. "SET WITHOUT CLUSTER" doesn't have
to recurse to children, but I did it like that for consistency and it avoids
the need to special case InvalidOid.

The previous patch failed pg_upgrade when restoring a clustered, parent index,
since it's marked INVALID until indexes have been built on all child tables, so
CLUSTER ON was rejected on invalid index.

So I think CLUSTER ON needs to be a separate pg_dump object, to allow attaching
the child index (thereby making the parent "valid") to happen before SET
CLUSTER on the parent index.

Rebased on b5913f612 and now a3dc92600.

This resolves ORDER BY test failure with COLLATE "C".

It occured to me that progress reporting should expose this.

I did this in the style of pg_stat_progress_create_index, adding columns
partitions_total and partitions_done showing the overall progress. The progress
of individual partitions is also visible in {blocks,tuples}_{done,total}.
This seems odd, but that's how the index view behaves.

Rebased on 8a8f4d8ede288c2a29105f4708e22ce7f3526149.

This also resolves an issue in the last patch which would've broken progress
reporting of vacuum full.

And take the suggestion to move memory context switching outside the loop.

--
Justin

Attachments:

v9-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload

From 3a39edbed1595efc230aac1de09fc276cc7ca7f4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 26 Nov 2020 14:37:08 -0600
Subject: [PATCH v9 1/8] pg_dump: make CLUSTER ON a separate dump object..

..since it needs to be restored after any child indexes are restored *and
attached*.  The order needs to be:

1) restore child and parent index (order doesn't matter);
2) attach child index;
3) set cluster on child and parent index (order doesn't matter);
---
 src/bin/pg_dump/pg_dump.c      | 86 ++++++++++++++++++++++++++--------
 src/bin/pg_dump/pg_dump.h      |  8 ++++
 src/bin/pg_dump/pg_dump_sort.c |  8 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index eb988d7eb4..e93d2eb828 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -208,6 +208,7 @@ static void dumpSequence(Archive *fout, const TableInfo *tbinfo);
 static void dumpSequenceData(Archive *fout, const TableDataInfo *tdinfo);
 static void dumpIndex(Archive *fout, const IndxInfo *indxinfo);
 static void dumpIndexAttach(Archive *fout, const IndexAttachInfo *attachinfo);
+static void dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo);
 static void dumpStatisticsExt(Archive *fout, const StatsExtInfo *statsextinfo);
 static void dumpConstraint(Archive *fout, const ConstraintInfo *coninfo);
 static void dumpTableConstraintComment(Archive *fout, const ConstraintInfo *coninfo);
@@ -7092,6 +7093,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				i_inddependcollversions;
 	int			ntups;
 
+	int	ncluster = 0;
+	IndexClusterInfo *clusterinfo;
+	clusterinfo = (IndexClusterInfo *)
+		pg_malloc0(numTables * sizeof(IndexClusterInfo));
+
 	for (i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &tblinfo[i];
@@ -7471,6 +7477,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				/* Plain secondary index */
 				indxinfo[j].indexconstraint = 0;
 			}
+
+			/* Record each table's CLUSTERed index, if any */
+			if (indxinfo[j].indisclustered)
+			{
+				IndxInfo   *index = &indxinfo[j];
+				IndexClusterInfo *cluster = &clusterinfo[ncluster];
+
+				cluster->dobj.objType = DO_INDEX_CLUSTER_ON;
+				cluster->dobj.catId.tableoid = 0;
+				cluster->dobj.catId.oid = 0;
+				AssignDumpId(&cluster->dobj);
+				cluster->dobj.name = pg_strdup(index->dobj.name);
+				cluster->dobj.namespace = index->indextable->dobj.namespace;
+				cluster->index = index;
+				cluster->indextable = &tblinfo[i];
+
+				/* The CLUSTER ON depends on its index.. */
+				addObjectDependency(&cluster->dobj, index->dobj.dumpId);
+
+				ncluster++;
+			}
 		}
 
 		PQclear(res);
@@ -10323,6 +10350,9 @@ dumpDumpableObject(Archive *fout, const DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_INDEX_CLUSTER_ON:
+			dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -16543,6 +16573,41 @@ getAttrName(int attrnum, const TableInfo *tblInfo)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * dumpIndexClusterOn
+ *	  record that the index is clustered.
+ */
+static void
+dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	TableInfo	*tbinfo = clusterinfo->indextable;
+	char		*qindxname;
+	PQExpBuffer	q;
+
+	if (dopt->dataOnly)
+		return;
+
+	q = createPQExpBuffer();
+	qindxname = pg_strdup(fmtId(clusterinfo->dobj.name));
+
+	/* index name is not qualified in this syntax */
+	appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n",
+					  fmtQualifiedDumpable(tbinfo), qindxname);
+
+	if (clusterinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+		ArchiveEntry(fout, clusterinfo->dobj.catId, clusterinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = clusterinfo->dobj.name,
+								  .namespace = tbinfo->dobj.namespace->dobj.name,
+								  .owner = tbinfo->rolname,
+								  .description = "INDEX CLUSTER ON",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = q->data));
+
+	destroyPQExpBuffer(q);
+	free(qindxname);
+}
+
 /*
  * dumpIndex
  *	  write out to fout a user-defined index
@@ -16597,16 +16662,6 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo)
 		 * similar code in dumpConstraint!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  qindxname);
-		}
-
 		/*
 		 * If the index has any statistics on some of its columns, generate
 		 * the associated ALTER INDEX queries.
@@ -16933,16 +16988,6 @@ dumpConstraint(Archive *fout, const ConstraintInfo *coninfo)
 		 * similar code in dumpIndex!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  fmtId(indxinfo->dobj.name));
-		}
-
 		/* If the index defines identity, we need to record that. */
 		if (indxinfo->indisreplident)
 		{
@@ -18448,6 +18493,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 				break;
 			case DO_INDEX:
 			case DO_INDEX_ATTACH:
+			case DO_INDEX_CLUSTER_ON:
 			case DO_STATSEXT:
 			case DO_REFRESH_MATVIEW:
 			case DO_TRIGGER:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 0a2213fb06..627c8fbdab 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -54,6 +54,7 @@ typedef enum
 	DO_ATTRDEF,
 	DO_INDEX,
 	DO_INDEX_ATTACH,
+	DO_INDEX_CLUSTER_ON,
 	DO_STATSEXT,
 	DO_RULE,
 	DO_TRIGGER,
@@ -386,6 +387,13 @@ typedef struct _indxInfo
 	DumpId		indexconstraint;
 } IndxInfo;
 
+typedef struct _indexClusterInfo
+{
+	DumpableObject dobj;
+	TableInfo  *indextable;		/* link to table the index is for */
+	IndxInfo   *index;		/* link to index itself */
+} IndexClusterInfo;
+
 typedef struct _indexAttachInfo
 {
 	DumpableObject dobj;
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 46461fb6a1..dd5b233196 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -75,6 +75,7 @@ enum dbObjectTypePriorities
 	PRIO_CONSTRAINT,
 	PRIO_INDEX,
 	PRIO_INDEX_ATTACH,
+	PRIO_INDEX_CLUSTER_ON,
 	PRIO_STATSEXT,
 	PRIO_RULE,
 	PRIO_TRIGGER,
@@ -108,6 +109,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_ATTRDEF,				/* DO_ATTRDEF */
 	PRIO_INDEX,					/* DO_INDEX */
 	PRIO_INDEX_ATTACH,			/* DO_INDEX_ATTACH */
+	PRIO_INDEX_CLUSTER_ON,			/* DO_INDEX_CLUSTER_ON */
 	PRIO_STATSEXT,				/* DO_STATSEXT */
 	PRIO_RULE,					/* DO_RULE */
 	PRIO_TRIGGER,				/* DO_TRIGGER */
@@ -136,6 +138,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+
 };
 
 StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
@@ -1348,6 +1351,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "INDEX ATTACH %s  (ID %d)",
 					 obj->name, obj->dumpId);
 			return;
+		case DO_INDEX_CLUSTER_ON:
+			snprintf(buf, bufsize,
+					 "INDEX CLUSTER ON %s  (ID %d)",
+					 obj->name, obj->dumpId);
+			return;
 		case DO_STATSEXT:
 			snprintf(buf, bufsize,
 					 "STATISTICS %s  (ID %d OID %u)",
-- 
2.17.0

v9-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From 0933879bdd1d21fdb8d31a3340f3815c45dfc7b4 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v9 2/8] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.

VACUUM (including vacuum full) has recursed into partition hierarchies since
partitions were introduced in v10 (3c3bb9933).
---
 doc/src/sgml/ref/cluster.sgml         |   7 ++
 src/backend/commands/cluster.c        | 173 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/commands/cluster.h        |   1 +
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 209 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 0d9720fd8e..17509b35eb 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -197,6 +197,13 @@ CLUSTER [VERBOSE]
     in the <structname>pg_stat_progress_cluster</structname> view. See
     <xref linkend="cluster-progress-reporting"/> for details.
   </para>
+
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    partition of the specified partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 096a06f7b3..0c08ac56dc 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -135,7 +140,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -146,14 +151,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -189,10 +186,34 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		}
 
 		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		table_close(rel, ShareUpdateExclusiveLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, params.options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -202,7 +223,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -225,28 +245,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -352,9 +351,10 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -398,8 +398,13 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -415,6 +420,14 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -483,6 +496,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -507,12 +523,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -584,10 +594,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1582,3 +1588,76 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	/*
+	 * We have to build the list in a different memory context so it will
+	 * survive the cross-transaction processing
+	 */
+	old_context = MemoryContextSwitchTo(cluster_context);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels (including the top indexOid) are included,
+		 * so as to be processed by cluster_rel, which calls
+		 * check_index_is_clusterable() and mark_index_clustered().
+		 */
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+	}
+
+	MemoryContextSwitchTo(old_context);
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+		ClusterParams cluster_params = { .options = options, };
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_params.options |= CLUOPT_RECHECK;
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					&cluster_params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 9f0208ac49..53d585c4d3 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index a941f2accd..c30ca01726 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -22,6 +22,7 @@
 /* flag bits for ClusterParams->flags */
 #define CLUOPT_RECHECK 0x01		/* recheck relation state */
 #define CLUOPT_VERBOSE 0x02		/* print progress info */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04	/* recheck relation state for indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index bdae8fe00c..c74cfa88cc 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -439,14 +439,62 @@ select * from clstr_temp;
 
 drop table clstr_temp;
 RESET SESSION AUTHORIZATION;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+      relname      | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 188183647c..9bcc77695c 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -196,12 +196,30 @@ drop table clstr_temp;
 
 RESET SESSION AUTHORIZATION;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v9-0003-f-progress-reporting.patchtext/x-diff; charset=us-asciiDownload

From 7001773daa339a49cb7b2e32c398fdd1a8e299b8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Mon, 8 Feb 2021 20:45:26 -0600
Subject: [PATCH v9 3/8] f! progress reporting..

This follows the precedent of pg_stat_progress_create_index, which
simultaneously shows the progress of 1) the leaf partition, and 2) the overall
progress as partitions_done / partitions_total.

This also includes current_child_index_relid, but the create_index view
doesn't, so maybe this shouldn't, either.
---
 doc/src/sgml/monitoring.sgml         | 21 +++++++++++
 src/backend/catalog/system_views.sql |  4 +-
 src/backend/commands/cluster.c       | 55 ++++++++++++++++------------
 src/backend/commands/vacuum.c        |  5 +++
 src/include/commands/progress.h      |  3 ++
 src/test/regress/expected/rules.out  |  4 +-
 6 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 3513e127b7..dca47f6a48 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6231,6 +6231,27 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
        is <literal>rebuilding index</literal>.
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>total_partitions</structfield> <type>bigint</type>
+      </para>
+       When clustering a partitioned table, this column is set to the total
+       number of partitions to be clustered.
+      <para>
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>done_partitions</structfield> <type>bigint</type>
+      </para>
+      <para>
+       When clustering a partitioned table, this column is set to the number
+       of partitions which have been clustered.
+      </para></entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index fc94a73a54..21c27f0437 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1070,7 +1070,9 @@ CREATE VIEW pg_stat_progress_cluster AS
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS index_rebuild_count,
+        S.param18 AS partitions_total,
+        S.param19 AS partitions_done
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 0c08ac56dc..7d8d1d8a69 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -77,7 +77,7 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
-static void cluster_multiple_rels(List *rvs, int options);
+static void cluster_multiple_rels(List *rvs, int options, bool ispartitioned);
 
 
 /*---------------------------------------------------------------------------
@@ -188,16 +188,26 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, ShareUpdateExclusiveLock);
 
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
 		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 		{
 			/* Do the job. */
+			pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+					PROGRESS_CLUSTER_COMMAND_CLUSTER);
 			cluster_rel(tableOid, indexOid, &params);
+			pgstat_progress_end_command();
 		}
 		else
 		{
 			List	   *rvs;
 			MemoryContext cluster_context;
 
+			int64		progress_val[2];
+			const int	progress_index[2] = {
+				PROGRESS_CLUSTER_COMMAND,
+				PROGRESS_CLUSTER_PARTITIONS_TOTAL,
+			};
+
 			/* Refuse to hold strong locks in a user transaction */
 			PreventInTransactionBlock(isTopLevel, "CLUSTER");
 
@@ -206,7 +216,15 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 												ALLOCSET_DEFAULT_SIZES);
 
 			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-			cluster_multiple_rels(rvs, params.options);
+
+			/* Report command and number of partitions */
+			progress_val[0] = PROGRESS_CLUSTER_COMMAND_CLUSTER;
+			/* XXX: rvs includes the parent, which is not a "partition" */
+			progress_val[1] = list_length(rvs);
+
+			pgstat_progress_update_multi_param(2, progress_index, progress_val);
+			cluster_multiple_rels(rvs, params.options, true);
+			pgstat_progress_end_command();
 
 			/* Start a new transaction for the cleanup work. */
 			StartTransactionCommand();
@@ -245,7 +263,11 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
+
+		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+				PROGRESS_CLUSTER_COMMAND_CLUSTER);
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED, false);
+		pgstat_progress_end_command();
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -282,14 +304,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
-
 	/*
 	 * We grab exclusive access to the target rel and index for the duration
 	 * of the transaction.  (This is redundant for the single-transaction
@@ -300,10 +314,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* If the table has gone away, we can skip processing it */
 	if (!OldHeap)
-	{
-		pgstat_progress_end_command();
 		return;
-	}
 
 	/*
 	 * Since we may open a new transaction for each relation, we have to check
@@ -319,7 +330,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		if (!pg_class_ownercheck(tableOid, GetUserId()))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
-			pgstat_progress_end_command();
 			return;
 		}
 
@@ -334,7 +344,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		if (RELATION_IS_OTHER_TEMP(OldHeap))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
-			pgstat_progress_end_command();
 			return;
 		}
 
@@ -346,7 +355,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
-				pgstat_progress_end_command();
 				return;
 			}
 
@@ -357,7 +365,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
-				pgstat_progress_end_command();
 				return;
 			}
 		}
@@ -416,7 +423,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		!RelationIsPopulated(OldHeap))
 	{
 		relation_close(OldHeap, AccessExclusiveLock);
-		pgstat_progress_end_command();
 		return;
 	}
 
@@ -424,7 +430,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
 	{
 		relation_close(OldHeap, AccessExclusiveLock);
-		pgstat_progress_end_command();
 		return;
 	}
 
@@ -440,8 +445,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	rebuild_relation(OldHeap, indexOid, verbose);
 
 	/* NB: rebuild_relation does table_close() on OldHeap */
-
-	pgstat_progress_end_command();
 }
 
 /*
@@ -1632,9 +1635,10 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 
 /* Cluster each relation in a separate transaction */
 static void
-cluster_multiple_rels(List *rvs, int options)
+cluster_multiple_rels(List *rvs, int options, bool ispartitioned)
 {
 	ListCell *lc;
+	off_t	  childnum = 0;
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
@@ -1657,6 +1661,11 @@ cluster_multiple_rels(List *rvs, int options)
 		cluster_rel(rvtc->tableOid, rvtc->indexOid,
 					&cluster_params);
 
+		if (ispartitioned)
+			pgstat_progress_update_param(PROGRESS_CLUSTER_PARTITIONS_DONE,
+					++childnum);
+
+
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index c064352e23..05d52858b1 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -37,6 +37,7 @@
 #include "catalog/pg_namespace.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
+#include "commands/progress.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -1937,7 +1938,11 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 			cluster_params.options |= CLUOPT_VERBOSE;
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, relid);
+		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+				PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
 		cluster_rel(relid, InvalidOid, &cluster_params);
+		pgstat_progress_end_command();
 	}
 	else
 		table_relation_vacuum(onerel, params, vac_strategy);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index 95ec5d02e9..13499a3e72 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -60,6 +60,9 @@
 #define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
 #define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
 #define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/* Need to avoid the CREATE INDEX params */
+#define PROGRESS_CLUSTER_PARTITIONS_TOTAL	17
+#define PROGRESS_CLUSTER_PARTITIONS_DONE	18
 
 /* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
 #define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index b1c9b7bdfe..c888415235 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1943,7 +1943,9 @@ pg_stat_progress_cluster| SELECT s.pid,
     s.param5 AS heap_tuples_written,
     s.param6 AS heap_blks_total,
     s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS index_rebuild_count,
+    s.param18 AS partitions_total,
+    s.param19 AS partitions_done
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
-- 
2.17.0

v9-0004-Propagate-changes-to-indisclustered-to-child-pare.patchtext/x-diff; charset=us-asciiDownload

From a1c4aecc9dd08b96c460ddc11c7d839259db83a3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v9 4/8] Propagate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 110 ++++++++++++++++----------
 src/backend/commands/indexcmds.c      |   2 +
 src/test/regress/expected/cluster.out |  46 +++++++++++
 src/test/regress/sql/cluster.sql      |  11 +++
 4 files changed, 126 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 7d8d1d8a69..dd8014c206 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -74,6 +74,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -513,66 +514,89 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered index as clustered, and parents of
+ * other indexes as unclustered.
+ * We wish to maintain the following properties:
+ * 1) Only one index on a relation can be marked clustered at once
+ * 2) If a partitioned index is clustered, then all its children must be
+ *     clustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
-
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
+	ListCell	*lc, *lc2;
+	List		*indexes;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel),
+											ShareRowExclusiveLock, NULL);
 
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's children
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
-
-	foreach(index, RelationGetIndexList(rel))
+	foreach(lc, inh)
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
-
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		indexes = RelationGetIndexList(thisrel);
+		foreach (lc2, indexes)
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
+			bool	isclustered;
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			List	*parentoids = get_rel_relispartition(thisIndexOid) ?
+				get_partition_ancestors(thisIndexOid) : NIL;
 
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
+			/*
+			 * A child of the clustered index must be set clustered;
+			 * indexes which are not children of the clustered index are
+			 * set unclustered
+			 */
+			isclustered = (thisIndexOid == indexOid) ||
+					list_member_oid(parentoids, indexOid);
+			Assert(OidIsValid(indexOid) || !isclustered);
+			set_indisclustered(thisIndexOid, isclustered, pg_index);
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
 
-		heap_freetuple(indexTuple);
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
+	list_free(inh);
 
 	table_close(pg_index, RowExclusiveLock);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8bc652ecd3..5f8fc2ea16 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -25,6 +25,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index c74cfa88cc..1d436dfaae 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -495,6 +495,52 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a)
+
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a)
+    "clstrpart1_idx_2" btree (a) CLUSTER
+Number of partitions: 2 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 9bcc77695c..0ded2be1ca 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -220,6 +220,17 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v9-0005-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload

From 015a61bf17129091c21515aee7e8702fce72c0b3 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 18:58:03 -0600
Subject: [PATCH v9 5/8] Invalidate parent indexes

---
 src/backend/commands/cluster.c        | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 26 ++++++++++++++++++++++++++
 src/test/regress/sql/cluster.sql      |  8 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index dd8014c206..8a2e66bbeb 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -598,6 +598,27 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	}
 	list_free(inh);
 
+	/*
+	 * Set parent of all indexes as unclustered when a rel is unclustered; and,
+	 * when an index is clustered, set parents of all /other/ indexes as
+	 * unclustered.
+	 */
+	indexes = RelationGetIndexList(rel);
+	foreach (lc, indexes)
+	{
+		Oid	thisIndexOid = lfirst_oid(lc);
+
+		if (thisIndexOid == indexOid)
+			continue;
+
+		while (get_rel_relispartition(thisIndexOid))
+		{
+			thisIndexOid = get_partition_parent(thisIndexOid);
+			set_indisclustered(thisIndexOid, false, pg_index);
+		}
+	}
+	list_free(indexes);
+
 	table_close(pg_index, RowExclusiveLock);
 }
 
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 1d436dfaae..6cba3cc4f9 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -541,6 +541,32 @@ Indexes:
     "clstrpart1_idx_2" btree (a) CLUSTER
 Number of partitions: 2 (Use \d+ to list them.)
 
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 0ded2be1ca..a1d132f288 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -231,6 +231,14 @@ CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
 \d clstrpart1
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v9-0006-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload

From aeada0f1098d3de5ffae9ff2b24ff1c982125621 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 19:11:41 -0600
Subject: [PATCH v9 6/8] Invalidate parent index cluster on attach

---
 src/backend/commands/indexcmds.c      | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 14 ++++++++++++++
 src/test/regress/sql/cluster.sql      |  5 +++++
 3 files changed, 40 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 5f8fc2ea16..4485b01b21 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4126,6 +4126,27 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/*
+	 * If the attached index is not clustered, invalidate cluster mark on
+	 * any parents
+	 */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 6cba3cc4f9..e7f0889743 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -567,6 +567,20 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 4 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index a1d132f288..3c8085c69e 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -239,6 +239,11 @@ CLUSTER clstrpart1 USING clstrpart1_idx_2;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
 \d clstrpart
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v9-0007-Preserve-indisclustered-on-children-of-clustered-.patchtext/x-diff; charset=us-asciiDownload

From ce4f904b4d68218fd7f2420687de25e9e9157949 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v9 7/8] Preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  3 ++-
 src/test/regress/expected/cluster.out | 12 ++++++++++++
 src/test/regress/sql/cluster.sql      |  4 ++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4ef61b5efd..71649b38ee 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -604,7 +604,8 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) &&
+												get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e7f0889743..a3943c13f5 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -581,6 +581,18 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 4 (Use \d+ to list them.)
 
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 3c8085c69e..22628e90ca 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -244,6 +244,10 @@ ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
 ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
 \d clstrpart
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v9-0008-pg_dump-partitioned-index-depend-on-its-partition.patchtext/x-diff; charset=us-asciiDownload

From 44aa5170d62a1ec5453f1e02c63c754998fa8738 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 25 Nov 2020 17:34:07 -0600
Subject: [PATCH v9 8/8] pg_dump: partitioned index depend on its partitions

This is required for restoring clustered parent index, which is marked INVALID
until indexes have been built on all its child tables, and it's prohibited to
CLUSTER ON an INVALID index

See also: 8cff4f5348d075e063100071013f00a900c32b0f
---
 src/backend/commands/tablecmds.c | 6 +++---
 src/bin/pg_dump/common.c         | 8 ++++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 559fa1d2e5..eb56890311 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -17497,6 +17497,9 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		table_close(idxRel, RowExclusiveLock);
 	}
 
+	/* make sure we see the validation we just did */
+	CommandCounterIncrement();
+
 	/*
 	 * If this index is in turn a partition of a larger index, validating it
 	 * might cause the parent to become valid also.  Try that.
@@ -17508,9 +17511,6 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		Relation	parentIdx,
 					parentTbl;
 
-		/* make sure we see the validation we just did */
-		CommandCounterIncrement();
-
 		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx));
 		parentTblId = get_partition_parent(RelationGetRelid(partedTbl));
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a261a5545..cdfba058fc 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -429,6 +429,12 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			attachinfo[k].parentIdx = parentidx;
 			attachinfo[k].partitionIdx = index;
 
+			/*
+			 * We want dependencies from parent to partition (so that the
+			 * partition index is created first)
+			 */
+			addObjectDependency(&parentidx->dobj, index->dobj.dumpId);
+
 			/*
 			 * We must state the DO_INDEX_ATTACH object's dependencies
 			 * explicitly, since it will not match anything in pg_depend.
@@ -446,6 +452,8 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			 */
 			addObjectDependency(&attachinfo[k].dobj, index->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj, parentidx->dobj.dumpId);
+			// addObjectDependency(&parentidx->dobj, attachinfo[k].dobj.dumpId);
+
 			addObjectDependency(&attachinfo[k].dobj,
 								index->indextable->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj,
-- 
2.17.0

#10

Justin Pryzby

pryzby@telsasoft.com

almost 5 years ago

In reply to: Justin Pryzby (#9)

8 attachment(s)

Re: CLUSTER on partitioned index

@cfbot: rebased

Attachments:

v10-0001-pg_dump-make-CLUSTER-ON-a-separate-dump-object.patchtext/x-diff; charset=us-asciiDownload

From 686cd8fc644f1f86d81d7748b66feddd634c4dc8 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 26 Nov 2020 14:37:08 -0600
Subject: [PATCH v10 1/8] pg_dump: make CLUSTER ON a separate dump object..

..since it needs to be restored after any child indexes are restored *and
attached*.  The order needs to be:

1) restore child and parent index (order doesn't matter);
2) attach child index;
3) set cluster on child and parent index (order doesn't matter);
---
 src/bin/pg_dump/pg_dump.c      | 86 ++++++++++++++++++++++++++--------
 src/bin/pg_dump/pg_dump.h      |  8 ++++
 src/bin/pg_dump/pg_dump_sort.c |  8 ++++
 3 files changed, 82 insertions(+), 20 deletions(-)

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 25717ce0e6..9a3044fd8c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -215,6 +215,7 @@ static void dumpSequence(Archive *fout, const TableInfo *tbinfo);
 static void dumpSequenceData(Archive *fout, const TableDataInfo *tdinfo);
 static void dumpIndex(Archive *fout, const IndxInfo *indxinfo);
 static void dumpIndexAttach(Archive *fout, const IndexAttachInfo *attachinfo);
+static void dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo);
 static void dumpStatisticsExt(Archive *fout, const StatsExtInfo *statsextinfo);
 static void dumpConstraint(Archive *fout, const ConstraintInfo *coninfo);
 static void dumpTableConstraintComment(Archive *fout, const ConstraintInfo *coninfo);
@@ -7232,6 +7233,11 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				i_inddependcollversions;
 	int			ntups;
 
+	int	ncluster = 0;
+	IndexClusterInfo *clusterinfo;
+	clusterinfo = (IndexClusterInfo *)
+		pg_malloc0(numTables * sizeof(IndexClusterInfo));
+
 	for (i = 0; i < numTables; i++)
 	{
 		TableInfo  *tbinfo = &tblinfo[i];
@@ -7611,6 +7617,27 @@ getIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 				/* Plain secondary index */
 				indxinfo[j].indexconstraint = 0;
 			}
+
+			/* Record each table's CLUSTERed index, if any */
+			if (indxinfo[j].indisclustered)
+			{
+				IndxInfo   *index = &indxinfo[j];
+				IndexClusterInfo *cluster = &clusterinfo[ncluster];
+
+				cluster->dobj.objType = DO_INDEX_CLUSTER_ON;
+				cluster->dobj.catId.tableoid = 0;
+				cluster->dobj.catId.oid = 0;
+				AssignDumpId(&cluster->dobj);
+				cluster->dobj.name = pg_strdup(index->dobj.name);
+				cluster->dobj.namespace = index->indextable->dobj.namespace;
+				cluster->index = index;
+				cluster->indextable = &tblinfo[i];
+
+				/* The CLUSTER ON depends on its index.. */
+				addObjectDependency(&cluster->dobj, index->dobj.dumpId);
+
+				ncluster++;
+			}
 		}
 
 		PQclear(res);
@@ -10472,6 +10499,9 @@ dumpDumpableObject(Archive *fout, const DumpableObject *dobj)
 		case DO_SUBSCRIPTION:
 			dumpSubscription(fout, (const SubscriptionInfo *) dobj);
 			break;
+		case DO_INDEX_CLUSTER_ON:
+			dumpIndexClusterOn(fout, (IndexClusterInfo *) dobj);
+			break;
 		case DO_PRE_DATA_BOUNDARY:
 		case DO_POST_DATA_BOUNDARY:
 			/* never dumped, nothing to do */
@@ -16721,6 +16751,41 @@ getAttrName(int attrnum, const TableInfo *tblInfo)
 	return NULL;				/* keep compiler quiet */
 }
 
+/*
+ * dumpIndexClusterOn
+ *	  record that the index is clustered.
+ */
+static void
+dumpIndexClusterOn(Archive *fout, const IndexClusterInfo *clusterinfo)
+{
+	DumpOptions *dopt = fout->dopt;
+	TableInfo	*tbinfo = clusterinfo->indextable;
+	char		*qindxname;
+	PQExpBuffer	q;
+
+	if (dopt->dataOnly)
+		return;
+
+	q = createPQExpBuffer();
+	qindxname = pg_strdup(fmtId(clusterinfo->dobj.name));
+
+	/* index name is not qualified in this syntax */
+	appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER ON %s;\n",
+					  fmtQualifiedDumpable(tbinfo), qindxname);
+
+	if (clusterinfo->dobj.dump & DUMP_COMPONENT_DEFINITION)
+		ArchiveEntry(fout, clusterinfo->dobj.catId, clusterinfo->dobj.dumpId,
+					 ARCHIVE_OPTS(.tag = clusterinfo->dobj.name,
+								  .namespace = tbinfo->dobj.namespace->dobj.name,
+								  .owner = tbinfo->rolname,
+								  .description = "INDEX CLUSTER ON",
+								  .section = SECTION_POST_DATA,
+								  .createStmt = q->data));
+
+	destroyPQExpBuffer(q);
+	free(qindxname);
+}
+
 /*
  * dumpIndex
  *	  write out to fout a user-defined index
@@ -16775,16 +16840,6 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo)
 		 * similar code in dumpConstraint!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  qindxname);
-		}
-
 		/*
 		 * If the index has any statistics on some of its columns, generate
 		 * the associated ALTER INDEX queries.
@@ -17111,16 +17166,6 @@ dumpConstraint(Archive *fout, const ConstraintInfo *coninfo)
 		 * similar code in dumpIndex!
 		 */
 
-		/* If the index is clustered, we need to record that. */
-		if (indxinfo->indisclustered)
-		{
-			appendPQExpBuffer(q, "\nALTER TABLE %s CLUSTER",
-							  fmtQualifiedDumpable(tbinfo));
-			/* index name is not qualified in this syntax */
-			appendPQExpBuffer(q, " ON %s;\n",
-							  fmtId(indxinfo->dobj.name));
-		}
-
 		/* If the index defines identity, we need to record that. */
 		if (indxinfo->indisreplident)
 		{
@@ -18626,6 +18671,7 @@ addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
 				break;
 			case DO_INDEX:
 			case DO_INDEX_ATTACH:
+			case DO_INDEX_CLUSTER_ON:
 			case DO_STATSEXT:
 			case DO_REFRESH_MATVIEW:
 			case DO_TRIGGER:
diff --git a/src/bin/pg_dump/pg_dump.h b/src/bin/pg_dump/pg_dump.h
index 5340843081..af916632db 100644
--- a/src/bin/pg_dump/pg_dump.h
+++ b/src/bin/pg_dump/pg_dump.h
@@ -54,6 +54,7 @@ typedef enum
 	DO_ATTRDEF,
 	DO_INDEX,
 	DO_INDEX_ATTACH,
+	DO_INDEX_CLUSTER_ON,
 	DO_STATSEXT,
 	DO_RULE,
 	DO_TRIGGER,
@@ -387,6 +388,13 @@ typedef struct _indxInfo
 	DumpId		indexconstraint;
 } IndxInfo;
 
+typedef struct _indexClusterInfo
+{
+	DumpableObject dobj;
+	TableInfo  *indextable;		/* link to table the index is for */
+	IndxInfo   *index;		/* link to index itself */
+} IndexClusterInfo;
+
 typedef struct _indexAttachInfo
 {
 	DumpableObject dobj;
diff --git a/src/bin/pg_dump/pg_dump_sort.c b/src/bin/pg_dump/pg_dump_sort.c
index 46461fb6a1..dd5b233196 100644
--- a/src/bin/pg_dump/pg_dump_sort.c
+++ b/src/bin/pg_dump/pg_dump_sort.c
@@ -75,6 +75,7 @@ enum dbObjectTypePriorities
 	PRIO_CONSTRAINT,
 	PRIO_INDEX,
 	PRIO_INDEX_ATTACH,
+	PRIO_INDEX_CLUSTER_ON,
 	PRIO_STATSEXT,
 	PRIO_RULE,
 	PRIO_TRIGGER,
@@ -108,6 +109,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_ATTRDEF,				/* DO_ATTRDEF */
 	PRIO_INDEX,					/* DO_INDEX */
 	PRIO_INDEX_ATTACH,			/* DO_INDEX_ATTACH */
+	PRIO_INDEX_CLUSTER_ON,			/* DO_INDEX_CLUSTER_ON */
 	PRIO_STATSEXT,				/* DO_STATSEXT */
 	PRIO_RULE,					/* DO_RULE */
 	PRIO_TRIGGER,				/* DO_TRIGGER */
@@ -136,6 +138,7 @@ static const int dbObjectTypePriority[] =
 	PRIO_PUBLICATION,			/* DO_PUBLICATION */
 	PRIO_PUBLICATION_REL,		/* DO_PUBLICATION_REL */
 	PRIO_SUBSCRIPTION			/* DO_SUBSCRIPTION */
+
 };
 
 StaticAssertDecl(lengthof(dbObjectTypePriority) == (DO_SUBSCRIPTION + 1),
@@ -1348,6 +1351,11 @@ describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)
 					 "INDEX ATTACH %s  (ID %d)",
 					 obj->name, obj->dumpId);
 			return;
+		case DO_INDEX_CLUSTER_ON:
+			snprintf(buf, bufsize,
+					 "INDEX CLUSTER ON %s  (ID %d)",
+					 obj->name, obj->dumpId);
+			return;
 		case DO_STATSEXT:
 			snprintf(buf, bufsize,
 					 "STATISTICS %s  (ID %d OID %u)",
-- 
2.17.0

v10-0002-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From 3d846ff4f4f311ade1e8c7afe1e7481249ca6d89 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v10 2/8] Implement CLUSTER of partitioned table..

This requires either specification of a partitioned index on which to cluster,
or that an partitioned index was previously set clustered.

VACUUM (including vacuum full) has recursed into partition hierarchies since
partitions were introduced in v10 (3c3bb9933).
---
 doc/src/sgml/ref/cluster.sgml         |   7 ++
 src/backend/commands/cluster.c        | 173 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/commands/cluster.h        |   1 +
 src/test/regress/expected/cluster.out |  58 ++++++++-
 src/test/regress/sql/cluster.sql      |  24 +++-
 6 files changed, 209 insertions(+), 55 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 0d9720fd8e..17509b35eb 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -197,6 +197,13 @@ CLUSTER [VERBOSE]
     in the <structname>pg_stat_progress_cluster</structname> view. See
     <xref linkend="cluster-progress-reporting"/> for details.
   </para>
+
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    partition of the specified partitioned index or (if not specified) the
+    partitioned index marked as clustered.
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 096a06f7b3..0c08ac56dc 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -135,7 +140,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 											AccessExclusiveLock,
 											0,
 											RangeVarCallbackOwnsTable, NULL);
-		rel = table_open(tableOid, NoLock);
+		rel = table_open(tableOid, ShareUpdateExclusiveLock);
 
 		/*
 		 * Reject clustering a remote temp table ... their local buffer
@@ -146,14 +151,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -189,10 +186,34 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		}
 
 		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		table_close(rel, ShareUpdateExclusiveLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+			cluster_multiple_rels(rvs, params.options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -202,7 +223,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -225,28 +245,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -352,9 +351,10 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -398,8 +398,13 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
+	{
 		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
 
+		/* Mark the index as clustered */
+		mark_index_clustered(OldHeap, indexOid, true);
+	}
+
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
 	 * been populated from its query. No harm is done because there is no data
@@ -415,6 +420,14 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	/* For a partitioned rel, we're done. */
+	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
+	{
+		relation_close(OldHeap, AccessExclusiveLock);
+		pgstat_progress_end_command();
+		return;
+	}
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -483,6 +496,9 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	 * the worst consequence of following broken HOT chains would be that we
 	 * might put recently-dead tuples out-of-order in the new table, and there
 	 * is little harm in that.)
+	 *
+	 * This also refuses to cluster on an "incomplete" partitioned index
+	 * created with "ON ONLY".
 	 */
 	if (!OldIndex->rd_index->indisvalid)
 		ereport(ERROR,
@@ -507,12 +523,6 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	Relation	pg_index;
 	ListCell   *index;
 
-	/* Disallow applying to a partitioned table */
-	if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-		ereport(ERROR,
-				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-				 errmsg("cannot mark index clustered in partitioned table")));
-
 	/*
 	 * If the index is already marked clustered, no need to do anything.
 	 */
@@ -584,10 +594,6 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
-	if (OidIsValid(indexOid))
-		mark_index_clustered(OldHeap, indexOid, true);
-
 	/* Remember info about rel before closing OldHeap */
 	relpersistence = OldHeap->rd_rel->relpersistence;
 	is_system_catalog = IsSystemRelation(OldHeap);
@@ -1582,3 +1588,76 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and associated index, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	/*
+	 * We have to build the list in a different memory context so it will
+	 * survive the cross-transaction processing
+	 */
+	old_context = MemoryContextSwitchTo(cluster_context);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		/*
+		 * Partitioned rels (including the top indexOid) are included,
+		 * so as to be processed by cluster_rel, which calls
+		 * check_index_is_clusterable() and mark_index_clustered().
+		 */
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+	}
+
+	MemoryContextSwitchTo(old_context);
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+		ClusterParams cluster_params = { .options = options, };
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_params.options |= CLUOPT_RECHECK;
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					&cluster_params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a053bc1e45..3fd192b8d2 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -588,6 +588,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index a941f2accd..c30ca01726 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -22,6 +22,7 @@
 /* flag bits for ClusterParams->flags */
 #define CLUOPT_RECHECK 0x01		/* recheck relation state */
 #define CLUOPT_VERBOSE 0x02		/* print progress info */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04	/* recheck relation state for indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e46a66952f..8fb496434e 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -444,14 +444,62 @@ DROP TABLE clustertest;
 CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
-ERROR:  cannot mark index clustered in partitioned table
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+      relname      | relkind | indisclustered 
+-------------------+---------+----------------
+ clstrpart11_a_idx | i       | t
+ clstrpart12_a_idx | I       | t
+ clstrpart1_a_idx  | I       | t
+ clstrpart2_a_idx  | i       | t
+ clstrpart33_a_idx | i       | t
+ clstrpart3_a_idx  | I       | t
+ clstrpart_idx     | I       | t
+(7 rows)
+
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a) CLUSTER
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index aee9cf83e0..57fc661863 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -202,12 +202,30 @@ CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+-- Check that clustering sets new indisclustered:
+SELECT relname, relkind, indisclustered FROM pg_partition_tree('clstrpart_idx'::regclass) AS tree JOIN pg_index i ON i.indexrelid=tree.relid JOIN pg_class c ON c.oid=indexrelid ORDER BY relname COLLATE "C";
+CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitioned
+CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
+CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v10-0003-f-progress-reporting.patchtext/x-diff; charset=us-asciiDownload

From c2c0faeed2612aa0d947058e31fc53e54dacd7e6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Mon, 8 Feb 2021 20:45:26 -0600
Subject: [PATCH v10 3/8] f! progress reporting..

This follows the precedent of pg_stat_progress_create_index, which
simultaneously shows the progress of 1) the leaf partition, and 2) the overall
progress as partitions_done / partitions_total.

This also includes current_child_index_relid, but the create_index view
doesn't, so maybe this shouldn't, either.
---
 doc/src/sgml/monitoring.sgml         | 21 +++++++++++
 src/backend/catalog/system_views.sql |  4 +-
 src/backend/commands/cluster.c       | 55 ++++++++++++++++------------
 src/backend/commands/vacuum.c        |  5 +++
 src/include/commands/progress.h      |  3 ++
 src/test/regress/expected/rules.out  |  4 +-
 6 files changed, 67 insertions(+), 25 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 56018745c8..28493c7931 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6293,6 +6293,27 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
        is <literal>rebuilding index</literal>.
       </para></entry>
      </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>total_partitions</structfield> <type>bigint</type>
+      </para>
+       When clustering a partitioned table, this column is set to the total
+       number of partitions to be clustered.
+      <para>
+      </para></entry>
+     </row>
+
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>done_partitions</structfield> <type>bigint</type>
+      </para>
+      <para>
+       When clustering a partitioned table, this column is set to the number
+       of partitions which have been clustered.
+      </para></entry>
+     </row>
+
     </tbody>
    </tgroup>
   </table>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 5f2541d316..0256809f06 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1142,7 +1142,9 @@ CREATE VIEW pg_stat_progress_cluster AS
         S.param5 AS heap_tuples_written,
         S.param6 AS heap_blks_total,
         S.param7 AS heap_blks_scanned,
-        S.param8 AS index_rebuild_count
+        S.param8 AS index_rebuild_count,
+        S.param18 AS partitions_total,
+        S.param19 AS partitions_done
     FROM pg_stat_get_progress_info('CLUSTER') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 0c08ac56dc..7d8d1d8a69 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -77,7 +77,7 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
-static void cluster_multiple_rels(List *rvs, int options);
+static void cluster_multiple_rels(List *rvs, int options, bool ispartitioned);
 
 
 /*---------------------------------------------------------------------------
@@ -188,16 +188,26 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		/* close relation, keep lock till commit */
 		table_close(rel, ShareUpdateExclusiveLock);
 
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
 		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
 		{
 			/* Do the job. */
+			pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+					PROGRESS_CLUSTER_COMMAND_CLUSTER);
 			cluster_rel(tableOid, indexOid, &params);
+			pgstat_progress_end_command();
 		}
 		else
 		{
 			List	   *rvs;
 			MemoryContext cluster_context;
 
+			int64		progress_val[2];
+			const int	progress_index[2] = {
+				PROGRESS_CLUSTER_COMMAND,
+				PROGRESS_CLUSTER_PARTITIONS_TOTAL,
+			};
+
 			/* Refuse to hold strong locks in a user transaction */
 			PreventInTransactionBlock(isTopLevel, "CLUSTER");
 
@@ -206,7 +216,15 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 												ALLOCSET_DEFAULT_SIZES);
 
 			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
-			cluster_multiple_rels(rvs, params.options);
+
+			/* Report command and number of partitions */
+			progress_val[0] = PROGRESS_CLUSTER_COMMAND_CLUSTER;
+			/* XXX: rvs includes the parent, which is not a "partition" */
+			progress_val[1] = list_length(rvs);
+
+			pgstat_progress_update_multi_param(2, progress_index, progress_val);
+			cluster_multiple_rels(rvs, params.options, true);
+			pgstat_progress_end_command();
 
 			/* Start a new transaction for the cleanup work. */
 			StartTransactionCommand();
@@ -245,7 +263,11 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
+
+		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+				PROGRESS_CLUSTER_COMMAND_CLUSTER);
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED, false);
+		pgstat_progress_end_command();
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -282,14 +304,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
-	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
-	if (OidIsValid(indexOid))
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_CLUSTER);
-	else
-		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
-									 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
-
 	/*
 	 * We grab exclusive access to the target rel and index for the duration
 	 * of the transaction.  (This is redundant for the single-transaction
@@ -300,10 +314,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* If the table has gone away, we can skip processing it */
 	if (!OldHeap)
-	{
-		pgstat_progress_end_command();
 		return;
-	}
 
 	/*
 	 * Since we may open a new transaction for each relation, we have to check
@@ -319,7 +330,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		if (!pg_class_ownercheck(tableOid, GetUserId()))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
-			pgstat_progress_end_command();
 			return;
 		}
 
@@ -334,7 +344,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		if (RELATION_IS_OTHER_TEMP(OldHeap))
 		{
 			relation_close(OldHeap, AccessExclusiveLock);
-			pgstat_progress_end_command();
 			return;
 		}
 
@@ -346,7 +355,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(indexOid)))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
-				pgstat_progress_end_command();
 				return;
 			}
 
@@ -357,7 +365,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
-				pgstat_progress_end_command();
 				return;
 			}
 		}
@@ -416,7 +423,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		!RelationIsPopulated(OldHeap))
 	{
 		relation_close(OldHeap, AccessExclusiveLock);
-		pgstat_progress_end_command();
 		return;
 	}
 
@@ -424,7 +430,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	if (!RELKIND_HAS_STORAGE(get_rel_relkind(tableOid)))
 	{
 		relation_close(OldHeap, AccessExclusiveLock);
-		pgstat_progress_end_command();
 		return;
 	}
 
@@ -440,8 +445,6 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	rebuild_relation(OldHeap, indexOid, verbose);
 
 	/* NB: rebuild_relation does table_close() on OldHeap */
-
-	pgstat_progress_end_command();
 }
 
 /*
@@ -1632,9 +1635,10 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 
 /* Cluster each relation in a separate transaction */
 static void
-cluster_multiple_rels(List *rvs, int options)
+cluster_multiple_rels(List *rvs, int options, bool ispartitioned)
 {
 	ListCell *lc;
+	off_t	  childnum = 0;
 
 	/* Commit to get out of starting transaction */
 	PopActiveSnapshot();
@@ -1657,6 +1661,11 @@ cluster_multiple_rels(List *rvs, int options)
 		cluster_rel(rvtc->tableOid, rvtc->indexOid,
 					&cluster_params);
 
+		if (ispartitioned)
+			pgstat_progress_update_param(PROGRESS_CLUSTER_PARTITIONS_DONE,
+					++childnum);
+
+
 		PopActiveSnapshot();
 		CommitTransactionCommand();
 	}
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 662aff04b4..1d8a7ca774 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -37,6 +37,7 @@
 #include "catalog/pg_namespace.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
+#include "commands/progress.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -1937,7 +1938,11 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 			cluster_params.options |= CLUOPT_VERBOSE;
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
+		pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, relid);
+		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+				PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
 		cluster_rel(relid, InvalidOid, &cluster_params);
+		pgstat_progress_end_command();
 	}
 	else
 		table_relation_vacuum(onerel, params, vac_strategy);
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index c6b139d57d..d85781ae0b 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -60,6 +60,9 @@
 #define PROGRESS_CLUSTER_TOTAL_HEAP_BLKS		5
 #define PROGRESS_CLUSTER_HEAP_BLKS_SCANNED		6
 #define PROGRESS_CLUSTER_INDEX_REBUILD_COUNT	7
+/* Need to avoid the CREATE INDEX params */
+#define PROGRESS_CLUSTER_PARTITIONS_TOTAL	17
+#define PROGRESS_CLUSTER_PARTITIONS_DONE	18
 
 /* Phases of cluster (as advertised via PROGRESS_CLUSTER_PHASE) */
 #define PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP	1
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 9b59a7b4a5..a8bcc599ac 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1943,7 +1943,9 @@ pg_stat_progress_cluster| SELECT s.pid,
     s.param5 AS heap_tuples_written,
     s.param6 AS heap_blks_total,
     s.param7 AS heap_blks_scanned,
-    s.param8 AS index_rebuild_count
+    s.param8 AS index_rebuild_count,
+    s.param18 AS partitions_total,
+    s.param19 AS partitions_done
    FROM (pg_stat_get_progress_info('CLUSTER'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
      LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_copy| SELECT s.pid,
-- 
2.17.0

v10-0004-Propagate-changes-to-indisclustered-to-child-par.patchtext/x-diff; charset=us-asciiDownload

From c98512d58e64f98ae87558da594d379759df80e1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 22:11:12 -0500
Subject: [PATCH v10 4/8] Propagate changes to indisclustered to child/parents

---
 src/backend/commands/cluster.c        | 110 ++++++++++++++++----------
 src/backend/commands/indexcmds.c      |   2 +
 src/test/regress/expected/cluster.out |  46 +++++++++++
 src/test/regress/sql/cluster.sql      |  11 +++
 4 files changed, 126 insertions(+), 43 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 7d8d1d8a69..dd8014c206 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -74,6 +74,7 @@ static void rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose);
 static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
+static void set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
 static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
 		Oid indexOid);
@@ -513,66 +514,89 @@ check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMOD
 	index_close(OldIndex, NoLock);
 }
 
+/*
+ * Helper for mark_index_clustered
+ * Mark a single index as clustered or not.
+ * pg_index is passed by caller to avoid repeatedly re-opening it.
+ */
+static void
+set_indisclustered(Oid indexOid, bool isclustered, Relation pg_index)
+{
+	HeapTuple	indexTuple;
+	Form_pg_index indexForm;
+
+	indexTuple = SearchSysCacheCopy1(INDEXRELID,
+									ObjectIdGetDatum(indexOid));
+	if (!HeapTupleIsValid(indexTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexOid);
+	indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+
+	/* this was checked earlier, but let's be real sure */
+	if (isclustered && !indexForm->indisvalid)
+		elog(ERROR, "cannot cluster on invalid index %u", indexOid);
+
+	indexForm->indisclustered = isclustered;
+	CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
+	heap_freetuple(indexTuple);
+}
+
 /*
  * mark_index_clustered: mark the specified index as the one clustered on
  *
- * With indexOid == InvalidOid, will mark all indexes of rel not-clustered.
+ * With indexOid == InvalidOid, mark all indexes of rel not-clustered.
+ * Otherwise, mark children of the clustered index as clustered, and parents of
+ * other indexes as unclustered.
+ * We wish to maintain the following properties:
+ * 1) Only one index on a relation can be marked clustered at once
+ * 2) If a partitioned index is clustered, then all its children must be
+ *     clustered.
  */
 void
 mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 {
-	HeapTuple	indexTuple;
-	Form_pg_index indexForm;
-	Relation	pg_index;
-	ListCell   *index;
-
-	/*
-	 * If the index is already marked clustered, no need to do anything.
-	 */
-	if (OidIsValid(indexOid))
-	{
-		if (get_index_isclustered(indexOid))
-			return;
-	}
+	ListCell	*lc, *lc2;
+	List		*indexes;
+	Relation	pg_index = table_open(IndexRelationId, RowExclusiveLock);
+	List		*inh = find_all_inheritors(RelationGetRelid(rel),
+											ShareRowExclusiveLock, NULL);
 
 	/*
 	 * Check each index of the relation and set/clear the bit as needed.
+	 * Iterate over the relation's children rather than the index's children
+	 * since we need to unset cluster for indexes on intermediate children,
+	 * too.
 	 */
-	pg_index = table_open(IndexRelationId, RowExclusiveLock);
-
-	foreach(index, RelationGetIndexList(rel))
+	foreach(lc, inh)
 	{
-		Oid			thisIndexOid = lfirst_oid(index);
-
-		indexTuple = SearchSysCacheCopy1(INDEXRELID,
-										 ObjectIdGetDatum(thisIndexOid));
-		if (!HeapTupleIsValid(indexTuple))
-			elog(ERROR, "cache lookup failed for index %u", thisIndexOid);
-		indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
+		Oid			inhrelid = lfirst_oid(lc);
+		Relation	thisrel = table_open(inhrelid, ShareRowExclusiveLock);
 
-		/*
-		 * Unset the bit if set.  We know it's wrong because we checked this
-		 * earlier.
-		 */
-		if (indexForm->indisclustered)
+		indexes = RelationGetIndexList(thisrel);
+		foreach (lc2, indexes)
 		{
-			indexForm->indisclustered = false;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
-		else if (thisIndexOid == indexOid)
-		{
-			/* this was checked earlier, but let's be real sure */
-			if (!indexForm->indisvalid)
-				elog(ERROR, "cannot cluster on invalid index %u", indexOid);
-			indexForm->indisclustered = true;
-			CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple);
-		}
+			bool	isclustered;
+			Oid		thisIndexOid = lfirst_oid(lc2);
+			List	*parentoids = get_rel_relispartition(thisIndexOid) ?
+				get_partition_ancestors(thisIndexOid) : NIL;
 
-		InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
-									 InvalidOid, is_internal);
+			/*
+			 * A child of the clustered index must be set clustered;
+			 * indexes which are not children of the clustered index are
+			 * set unclustered
+			 */
+			isclustered = (thisIndexOid == indexOid) ||
+					list_member_oid(parentoids, indexOid);
+			Assert(OidIsValid(indexOid) || !isclustered);
+			set_indisclustered(thisIndexOid, isclustered, pg_index);
+
+			InvokeObjectPostAlterHookArg(IndexRelationId, thisIndexOid, 0,
+										 InvalidOid, is_internal);
+		}
 
-		heap_freetuple(indexTuple);
+		list_free(indexes);
+		table_close(thisrel, ShareRowExclusiveLock);
 	}
+	list_free(inh);
 
 	table_close(pg_index, RowExclusiveLock);
 }
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 166374cc0c..7d176c1062 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -25,6 +25,7 @@
 #include "catalog/catalog.h"
 #include "catalog/index.h"
 #include "catalog/indexing.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
 #include "catalog/pg_constraint.h"
 #include "catalog/pg_inherits.h"
@@ -32,6 +33,7 @@
 #include "catalog/pg_opfamily.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_type.h"
+#include "commands/cluster.h"
 #include "commands/comment.h"
 #include "commands/dbcommands.h"
 #include "commands/defrem.h"
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 8fb496434e..a2af5c15ec 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -500,6 +500,52 @@ Indexes:
     "clstrpart_idx" btree (a) CLUSTER
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a)
+
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+            Table "public.clstrpart33"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart3 DEFAULT
+Indexes:
+    "clstrpart33_a_idx" btree (a) CLUSTER
+
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
+       Partitioned table "public.clstrpart1"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (1) TO (10)
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart1_a_idx" btree (a)
+    "clstrpart1_idx_2" btree (a) CLUSTER
+Number of partitions: 2 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 57fc661863..3b7fa3142f 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -226,6 +226,17 @@ CLUSTER clstrpart1 USING clstrpart1_a_idx; -- partition which is itself partitio
 CLUSTER clstrpart12 USING clstrpart12_a_idx; -- partition which is itself partitioned, no childs
 CLUSTER clstrpart2 USING clstrpart2_a_idx; -- leaf
 \d clstrpart
+-- Test that it recurses to grandchildren:
+\d clstrpart33
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+\d clstrpart33
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+\d clstrpart33
+-- Check that only one child is marked clustered after marking clustered on a different parent
+CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
+\d clstrpart1
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v10-0005-Invalidate-parent-indexes.patchtext/x-diff; charset=us-asciiDownload

From 6badf17a18b344cc1be6dcae195478011889ee43 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 18:58:03 -0600
Subject: [PATCH v10 5/8] Invalidate parent indexes

---
 src/backend/commands/cluster.c        | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 26 ++++++++++++++++++++++++++
 src/test/regress/sql/cluster.sql      |  8 ++++++++
 3 files changed, 55 insertions(+)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index dd8014c206..1b8ff911b0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -598,6 +598,27 @@ mark_index_clustered(Relation rel, Oid indexOid, bool is_internal)
 	}
 	list_free(inh);
 
+	/*
+	 * Set parent of all indexes as unclustered when a rel is unclustered; and,
+	 * when an index is clustered, set parents of all /other/ indexes as
+	 * unclustered.
+	 */
+	indexes = RelationGetIndexList(rel);
+	foreach (lc, indexes)
+	{
+		Oid	thisIndexOid = lfirst_oid(lc);
+
+		if (thisIndexOid == indexOid)
+			continue;
+
+		while (get_rel_relispartition(thisIndexOid))
+		{
+			thisIndexOid = get_partition_parent(thisIndexOid, true);
+			set_indisclustered(thisIndexOid, false, pg_index);
+		}
+	}
+	list_free(indexes);
+
 	table_close(pg_index, RowExclusiveLock);
 }
 
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index a2af5c15ec..c316c41905 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -546,6 +546,32 @@ Indexes:
     "clstrpart1_idx_2" btree (a) CLUSTER
 Number of partitions: 2 (Use \d+ to list them.)
 
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 3b7fa3142f..1cc7c1aaca 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -237,6 +237,14 @@ CREATE INDEX clstrpart1_idx_2 ON clstrpart1(a);
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 CLUSTER ON clstrpart1_idx_2;
 \d clstrpart1
+-- Check that the parent index is marked not clustered after clustering a partition on a different index:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CLUSTER clstrpart1 USING clstrpart1_idx_2;
+\d clstrpart
+-- Check that the parent index is marked not clustered after setting a partition not clustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v10-0006-Invalidate-parent-index-cluster-on-attach.patchtext/x-diff; charset=us-asciiDownload

From 8bb301a35c7319967a204e962a88c47ce15138d5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Thu, 5 Nov 2020 19:11:41 -0600
Subject: [PATCH v10 6/8] Invalidate parent index cluster on attach

---
 src/backend/commands/indexcmds.c      | 21 +++++++++++++++++++++
 src/test/regress/expected/cluster.out | 14 ++++++++++++++
 src/test/regress/sql/cluster.sql      |  5 +++++
 3 files changed, 40 insertions(+)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 7d176c1062..bf13bed627 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -4110,6 +4110,27 @@ IndexSetParentIndex(Relation partitionIdx, Oid parentOid)
 	/* set relispartition correctly on the partition */
 	update_relispartition(partRelid, OidIsValid(parentOid));
 
+	/*
+	 * If the attached index is not clustered, invalidate cluster mark on
+	 * any parents
+	 */
+	if ((OidIsValid(parentOid) && get_index_isclustered(parentOid)) ||
+			get_index_isclustered(partRelid))
+	{
+		Relation indrel;
+
+		/* Make relispartition visible */
+		CommandCounterIncrement();
+
+		indrel = table_open(IndexGetRelation(partRelid, false),
+				ShareUpdateExclusiveLock);
+		mark_index_clustered(indrel,
+				get_index_isclustered(partRelid) ? partRelid : InvalidOid,
+				true);
+		table_close(indrel, ShareUpdateExclusiveLock);
+
+	}
+
 	if (fix_dependencies)
 	{
 		/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index c316c41905..846975fc06 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -572,6 +572,20 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 3 (Use \d+ to list them.)
 
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 4 (Use \d+ to list them.)
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 1cc7c1aaca..ee14e7079f 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -245,6 +245,11 @@ CLUSTER clstrpart1 USING clstrpart1_idx_2;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ALTER TABLE clstrpart1 SET WITHOUT CLUSTER;
 \d clstrpart
+-- Check that attaching an unclustered index marks the parent unclustered:
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
+ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
+\d clstrpart
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v10-0007-Preserve-indisclustered-on-children-of-clustered.patchtext/x-diff; charset=us-asciiDownload

From f0ffdeb4b8fc950003e3426e83ed512f79828442 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Tue, 6 Oct 2020 20:40:18 -0500
Subject: [PATCH v10 7/8] Preserve indisclustered on children of clustered,
 partitioned indexes

Note, this takes a parentIndex, but that wasn't previously used ...
UpdateIndexRelation(Oid indexoid, Oid heapoid, Oid parentIndexId,
---
 src/backend/catalog/index.c           |  3 ++-
 src/test/regress/expected/cluster.out | 12 ++++++++++++
 src/test/regress/sql/cluster.sql      |  4 ++++
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index a628b3281c..8e97d88c32 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -615,7 +615,8 @@ UpdateIndexRelation(Oid indexoid,
 	values[Anum_pg_index_indisprimary - 1] = BoolGetDatum(primary);
 	values[Anum_pg_index_indisexclusion - 1] = BoolGetDatum(isexclusion);
 	values[Anum_pg_index_indimmediate - 1] = BoolGetDatum(immediate);
-	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(false);
+	values[Anum_pg_index_indisclustered - 1] = BoolGetDatum(OidIsValid(parentIndexId) &&
+												get_index_isclustered(parentIndexId));
 	values[Anum_pg_index_indisvalid - 1] = BoolGetDatum(isvalid);
 	values[Anum_pg_index_indcheckxmin - 1] = BoolGetDatum(false);
 	values[Anum_pg_index_indisready - 1] = BoolGetDatum(isready);
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 846975fc06..9429482980 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -586,6 +586,18 @@ Indexes:
     "clstrpart_idx" btree (a)
 Number of partitions: 4 (Use \d+ to list them.)
 
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
+             Table "public.clstrpart4"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition of: clstrpart FOR VALUES FROM (30) TO (40)
+Indexes:
+    "clstrpart4_a_idx" btree (a) CLUSTER
+
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index ee14e7079f..af0849904d 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -250,6 +250,10 @@ ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 CREATE TABLE clstrpart5 (LIKE clstrpart INCLUDING INDEXES);
 ALTER TABLE clstrpart ATTACH PARTITION clstrpart5 FOR VALUES FROM (40)TO(50);
 \d clstrpart
+-- Check that new children inherit clustered mark
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+CREATE TABLE clstrpart4 PARTITION OF clstrpart FOR VALUES FROM (30)TO(40);
+\d clstrpart4
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

v10-0008-pg_dump-partitioned-index-depend-on-its-partitio.patchtext/x-diff; charset=us-asciiDownload

From a183a1ad79d00e25dfe4912e6c170631b7a0258c Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 25 Nov 2020 17:34:07 -0600
Subject: [PATCH v10 8/8] pg_dump: partitioned index depend on its partitions

This is required for restoring clustered parent index, which is marked INVALID
until indexes have been built on all its child tables, and it's prohibited to
CLUSTER ON an INVALID index

See also: 8cff4f5348d075e063100071013f00a900c32b0f
---
 src/backend/commands/tablecmds.c | 6 +++---
 src/bin/pg_dump/common.c         | 8 ++++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 88a68a4697..aaf955458c 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -18153,6 +18153,9 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		table_close(idxRel, RowExclusiveLock);
 	}
 
+	/* make sure we see the validation we just did */
+	CommandCounterIncrement();
+
 	/*
 	 * If this index is in turn a partition of a larger index, validating it
 	 * might cause the parent to become valid also.  Try that.
@@ -18164,9 +18167,6 @@ validatePartitionedIndex(Relation partedIdx, Relation partedTbl)
 		Relation	parentIdx,
 					parentTbl;
 
-		/* make sure we see the validation we just did */
-		CommandCounterIncrement();
-
 		parentIdxId = get_partition_parent(RelationGetRelid(partedIdx), false);
 		parentTblId = get_partition_parent(RelationGetRelid(partedTbl), false);
 		parentIdx = relation_open(parentIdxId, AccessExclusiveLock);
diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 1a261a5545..cdfba058fc 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -429,6 +429,12 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			attachinfo[k].parentIdx = parentidx;
 			attachinfo[k].partitionIdx = index;
 
+			/*
+			 * We want dependencies from parent to partition (so that the
+			 * partition index is created first)
+			 */
+			addObjectDependency(&parentidx->dobj, index->dobj.dumpId);
+
 			/*
 			 * We must state the DO_INDEX_ATTACH object's dependencies
 			 * explicitly, since it will not match anything in pg_depend.
@@ -446,6 +452,8 @@ flagInhIndexes(Archive *fout, TableInfo tblinfo[], int numTables)
 			 */
 			addObjectDependency(&attachinfo[k].dobj, index->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj, parentidx->dobj.dumpId);
+			// addObjectDependency(&parentidx->dobj, attachinfo[k].dobj.dumpId);
+
 			addObjectDependency(&attachinfo[k].dobj,
 								index->indextable->dobj.dumpId);
 			addObjectDependency(&attachinfo[k].dobj,
-- 
2.17.0

#11

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Justin Pryzby (#10)

Re: CLUSTER on partitioned index

Hi,
For v10-0002-Implement-CLUSTER-of-partitioned-table.patch :

or that an partitioned index was previously set clustered.

'an partitioned index' -> a partitioned index

+ * Return a List of tables and associated index, where each index is a

associated index -> associated indices

For cluster():
-       rel = table_open(tableOid, NoLock);
+       rel = table_open(tableOid, ShareUpdateExclusiveLock);

Considering the comment preceding cluster() (forced to acquire exclusive
locks on all the tables), maybe add a comment explaining why it is safe to
take ShareUpdateExclusiveLock.

+cluster_multiple_rels(List *rvs, int options)

I think the multiple in the method name is not needed since the relation is
in plural.

Cheers

On Fri, Apr 2, 2021 at 1:03 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

Show quoted text

@cfbot: rebased

#12

Alvaro Herrera

alvherre@2ndquadrant.com

over 4 years ago

In reply to: Justin Pryzby (#10)

Re: CLUSTER on partitioned index

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

--
Álvaro Herrera 39°49'30"S 73°17'W — https://www.EnterpriseDB.com/
"Ninguna manada de bestias tiene una voz tan horrible como la humana" (Orual)

#13

Michael Paquier

michael@paquier.xyz

over 4 years ago

In reply to: Alvaro Herrera (#12)

Re: CLUSTER on partitioned index

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

Yeah, I am not getting really excited about doing anything here
either. I thought for some time about the interactions with
indisclustered and partitioned tables, but anything I could come up
with felt clunky.
--
Michael

#14

Laurenz Albe

laurenz.albe@cybertec.at

over 4 years ago

In reply to: Alvaro Herrera (#12)

Re: CLUSTER on partitioned index

On Tue, 2021-07-20 at 20:27 -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

I see is used in the field occasionally, as it can really drastically
improve performance. But I admit is is not frequently used.

In a data warehouse, which is updated only occasionally, running
CLUSTER after an update can make a lot of sense.

I personally think that it is enough to be able to cluster the table
partiton by partition.

Yours,
Laurenz Albe

#15

Justin Pryzby

pryzby@telsasoft.com

over 4 years ago

In reply to: Alvaro Herrera (#12)

1 attachment(s)

Re: CLUSTER on partitioned index

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time.

The cluster now is done one partition at a time, so it might take a long time,
but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and
(since v14) REINDEX.

The patch series would be simpler if partitioned indexes weren't allowed to be
marked CLUSTERED ON. Then, "USING <index>" would be required, which is a step
forward from not supporting cluster on partitioned index at all. As attached.
It's arguably true that the follow-up patches supporting indisclustered on
partitioned indexes aren't worth the trouble.

For sure CLUSTER is useful, see eg.
https://github.com/bucardo/check_postgres/issues/29

It's sometimes important that the table is clustered to allow index scan to
work well (or be chosen at all).

If a table is scanned by an index, and isn't well-clustered, then a larger
fraction (multiple) of the table will be read than what's optimal. That
requires more IO, and more cache space.

A year ago, I partitioned one of our previously-unpartitioned tables, and ended
up clustering the partitions on their partition key (and indexed column) using
\gexec. This was preferable to doing INSERT .. SELECT .. ORDER BY, which
would've made the initial process slower - maybe unjustifiably slower for some
customers. Cluster (using \gexec) was something I was able to do afterward,
for completeness, since I expect the partitions to be mostly-clustered
automatically, so it was bothering me that the existing data was unordered, and
that it might behave differently in the future.

What justifies spending so much time on this implementation?

Actually, I don't use partitioned indexes at all here, so this is not for us..
I worked on this after Adger asked about CIC on partitioned tables (for which I
have a patch in the queue). Isn't it worth supporting that (or should we
include an example about how to use format() with %I and \gexec) ?

VACUUM [FULL] has recursed into partitions since v10 (f0e44751d).
REINDEX supports partitioned tables in v14 (a6642b3ae).
Partitioned indexes exist since v11 (as you well know), so it's somewhat odd
that CLUSTER isn't supported, and seems increasingly weird as decreasing number
of DDL commands are not supported. Supporting DDL on partitioned tables
supports the idea that the physical partitions can be seen as an implementation
detail by the DBA, which I understand was the intent since v10.

You're right that I wouldn't plan to *routinely* re-cluster a partitioned
table. Rather, I'd cluster only its "recent" *partitions*, and leave the old
ones alone. Or cluster the partitions, a single time, once they're no longer
recent. I don't think the feature is marginal just because I don't use it
routinely.

My impression is that CLUSTER is pretty much a fringe command nowadays,
because of the access exclusive lock required.

A step forward would be to integrate something like pg_repack/reorg/squeeze.
I used pg_repack --index until v12 got REINDEX CONCURRENTLY. The goal there
was to improve index scans on some large, append-only partitions where the
planner gave an index scan, but performance was poor (now, we use BRIN so it
works well without reindex). I tested that this would still be an issue by
creating a non-brin index for a single day's table (even with v13 deduplication
and v12 TID tiebreak).

As I see it, support for partitioned cluster is orthogonal to an
online/concurrent cluster, which is a job for another patch.

--
Justin

Attachments:

v11-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From d2e7daf9c05cd3c20c60f5e35c0d6b2612d75d1b Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v11] Implement CLUSTER of partitioned table..

For now, partitioned indexes cannot be marked clustered, so clustering requires
specification of a partitioned index on the partitioned table.

VACUUM (including vacuum full) has recursed into partitione tables since
partitioning were introduced in v10 (3c3bb9933).  See expand_vacuum_rel().

See also a556549d7 and 19de0ab23.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 170 +++++++++++++++++++-------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/commands/cluster.h        |   1 +
 src/test/regress/expected/cluster.out |  46 ++++++-
 src/test/regress/sql/cluster.sql      |  22 +++-
 6 files changed, 197 insertions(+), 49 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 86f5fdc469..b3463ae5c4 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -196,6 +196,12 @@ CLUSTER [VERBOSE]
     in the <structname>pg_stat_progress_cluster</structname> view. See
     <xref linkend="cluster-progress-reporting"/> for details.
   </para>
+
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    partition of the specified partitioned index (which must be specified).
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 9d22f648a8..c5923c5217 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -131,6 +136,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		Relation	rel;
 
 		/* Find, lock, and check permissions on the table */
+		/* Obtain AEL now to avoid lock-upgrade hazard in the single-transaction case */
 		tableOid = RangeVarGetRelidExtended(stmt->relation,
 											AccessExclusiveLock,
 											0,
@@ -146,14 +152,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -188,11 +186,45 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 								stmt->indexname, stmt->relation->relname)));
 		}
 
-		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* close relation, keep lock till commit */
+			table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+
+			/*
+			 * For now, partitioned indexes are not actually marked clustered.
+			 */
+			check_index_is_clusterable(rel, indexOid, true, AccessShareLock);
+
+			/* close relation, releasing lock on parent table */
+			table_close(rel, AccessExclusiveLock);
+
+			/* Do the job. */
+			cluster_multiple_rels(rvs, params.options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -202,7 +234,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -225,28 +256,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -327,10 +337,11 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		/*
 		 * Silently skip a temp table for a remote session.  Only doing this
 		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER), because there is
-		 * another check in cluster() which will stop any attempt to cluster
-		 * remote temp tables by name.  There is another check in cluster_rel
-		 * which is redundant, but we leave it for extra safety.
+		 * somebody is executing a database-wide CLUSTER or on a partitioned
+		 * table), because there is another check in cluster() which will stop
+		 * any attempt to cluster remote temp tables by name.  There is another
+		 * check in cluster_rel which is redundant, but we leave it for extra
+		 * safety.
 		 */
 		if (RELATION_IS_OTHER_TEMP(OldHeap))
 		{
@@ -352,9 +363,10 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -415,6 +427,9 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	// Should silently skip this rather than assert ?
+	Assert(RELKIND_HAS_STORAGE(OldHeap->rd_rel->relkind));
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -585,8 +600,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
 	if (OidIsValid(indexOid))
+		/* Mark the correct index as clustered */
 		mark_index_clustered(OldHeap, indexOid, true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1604,3 +1619,74 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and their associated indexes, where each index is a
+ * partition of the given index
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	/*
+	 * We have to build the list in a different memory context so it will
+	 * survive the cross-transaction processing
+	 */
+	old_context = MemoryContextSwitchTo(cluster_context);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		if (!RELKIND_HAS_STORAGE(get_rel_relkind(indexrelid)))
+			continue;
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+	}
+
+	MemoryContextSwitchTo(old_context);
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+		ClusterParams cluster_params = { .options = options, };
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_params.options |= CLUOPT_RECHECK;
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					&cluster_params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5cd5838668..c377511278 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -599,6 +599,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index b64d3bc204..d400b6e937 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -22,6 +22,7 @@
 /* flag bits for ClusterParams->flags */
 #define CLUOPT_RECHECK 0x01		/* recheck relation state */
 #define CLUOPT_VERBOSE 0x02		/* print progress info */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04	/* recheck relation state for indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e46a66952f..7461e1c090 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -444,14 +444,52 @@ DROP TABLE clustertest;
 CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+CLUSTER clstrpart USING clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- Partitioned indexes aren't and can't be marked un/clustered:
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+CLUSTER clstrpart;
+ERROR:  there is no previously clustered index for table "clstrpart"
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
-CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
-DROP TABLE clstrpart;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index aee9cf83e0..f31f8ef2b2 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -202,12 +202,28 @@ CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
-DROP TABLE clstrpart;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+-- Partitioned indexes aren't and can't be marked un/clustered:
+\d clstrpart
+CLUSTER clstrpart;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 
 -- Test CLUSTER with external tuplesorting
 
-- 
2.17.0

#16

Matthias van de Meent

boekewurm+postgres@gmail.com

over 4 years ago

In reply to: Justin Pryzby (#15)

Re: CLUSTER on partitioned index

On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time.

The cluster now is done one partition at a time, so it might take a long time,
but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and
(since v14) REINDEX.

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

I noticed that you store the result of find_all_inheritors(...,
NoLock) in get_tables_to_cluster_partitioned, without taking care of
potential concurrent partition hierarchy changes that the comment on
find_all_inheritors warns against or documenting why it is safe, which
sounds dangerous in case someone wants to adapt the code. One problem
I can think of is that only storing reloid and indoid is not
necessarily safe, as they might be reused by drop+create table running
parallel to the CLUSTER command.

Apart from that, I think it would be useful (though not strictly
necessary for this patch) if you could adapt the current CLUSTER
progress reporting to report the progress for the whole partition
hierarchy, instead of a new progress report for each relation, as was
the case in earlier versions of the patchset.

The v11 patch seems quite incomplete when compared to previous
discussions, or at the very least is very limited (no ALTER TABLE
clustering commands for partitioned tables, but `CLUSTER ptable USING
pindex` is supported). If v11 is the new proposed direction for ptable
clustering, could you also document these limitations in the
cluster.sgml and alter_table.sgml docs?

[ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]

diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
...
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ERROR:  cannot mark index clustered in partitioned table

This error message does not seem to match my expectation as a user: I
am not trying to mark an index as clustered, and for a normal table
"SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
that behaviour of normal unclustered tables should be shared here as
well. At the very least, the error message should be changed.

ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
ERROR: cannot mark index clustered in partitioned table

A "HINT: use the CLUSTER command to cluster partitioned tables" (or
equivalent) should be added if we decide to keep the clustering APIs
of ALTER TABLE disabled for partitioned tables, as CLUSTER is now
implemented for partitioned tables.

-DROP TABLE clstrpart;

I believe that this cleanup should not be fully removed, but moved to
before '-- Test CLUSTER with external tuplesorting', as the table is
not used after that line.

Kind regards,

Matthias van de Meent

#17

Justin Pryzby

pryzby@telsasoft.com

over 4 years ago

In reply to: Matthias van de Meent (#16)

Re: CLUSTER on partitioned index

On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:

On Sun, 12 Sept 2021 at 22:10, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time.

The cluster now is done one partition at a time, so it might take a long time,
but doesn't lock the whole partition heirarchy. Same as VACUUM (since v10) and
(since v14) REINDEX.

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature. My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

I noticed that you store the result of find_all_inheritors(...,
NoLock) in get_tables_to_cluster_partitioned, without taking care of
potential concurrent partition hierarchy changes that the comment on
find_all_inheritors warns against or documenting why it is safe, which
sounds dangerous in case someone wants to adapt the code. One problem
I can think of is that only storing reloid and indoid is not
necessarily safe, as they might be reused by drop+create table running
parallel to the CLUSTER command.

The parallel code in vacuum is expand_vacuum_rel(), which is where the
corresponding things happens for vacuum full. This patch is to make cluster()
do all the same stuff before calling cluster_rel().

What VACUUM tries to do is to avoid erroring if a partition is dropped while
cluster is running. cluster_rel() does the same thing by calling
cluster_multiple_rels() ,which uses CLUOPT_RECHECK.

If the OIDs wrapped around, I think existing vacuum could accidentally process
a new table with the same OID as a dropped partition. I think cluster would
*normally* catch that case and error in check_index_is_clusterable():
| Check that index is in fact an index on the given relation

Arguably VACUUM FULL could call cluster() (not cluster_rel()) and pass the
partitioned table rather than first expanding it. But non-full vacuum needs to
expand partitioned tables anyway.

Apart from that, I think it would be useful (though not strictly
necessary for this patch) if you could adapt the current CLUSTER
progress reporting to report the progress for the whole partition
hierarchy, instead of a new progress report for each relation, as was
the case in earlier versions of the patchset.

Yea, but this is already true for VACUUM FULL (which uses CLUSTER and supports
partitioned tables since v10) and REINDEX.
See also /messages/by-id/20210216064214.GI28165@telsasoft.com

My goal is to present a minimal patch and avoid any nonessential complexity.

The v11 patch seems quite incomplete when compared to previous
discussions, or at the very least is very limited (no ALTER TABLE
clustering commands for partitioned tables, but `CLUSTER ptable USING
pindex` is supported). If v11 is the new proposed direction for ptable
clustering, could you also document these limitations in the
cluster.sgml and alter_table.sgml docs?

You said it's less complete, but it's is due to deliberate reduction in scope.
cluster.sgml says:
+    Clustering a partitioned table clusters each of its partitions using the
+    partition of the specified partitioned index (which must be specified).

The ALTER restriction hasn't changed, so I didn't touch the documentation.

I am still curious myself to know if this is the direction the patch should
move.

[ v11-0001-Implement-CLUSTER-of-partitioned-table.patch ]
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
...
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ERROR:  cannot mark index clustered in partitioned table
This error message does not seem to match my expectation as a user: I
am not trying to mark an index as clustered, and for a normal table
"SET WITHOUT CLUSTER" does not fail for unclustered tables. I think
that behaviour of normal unclustered tables should be shared here as
well. At the very least, the error message should be changed.

This is the pre-existing behavior.

-DROP TABLE clstrpart;

I believe that this cleanup should not be fully removed, but moved to
before '-- Test CLUSTER with external tuplesorting', as the table is
not used after that line.

You're right - this was from when the patchset handled CLUSTER ON.
Leaving the index allows testing in pg_dump - a large part of the complexity of
the elided patches is to handle restoring a partitioned index, without
violating the rule that partitions of an clustered index must also be
clustered. I adjusted this in my local branch.

Thanks for looking. I'm going to see about updating comments based on
corresponding parts of vacuum and on this message itself.

--
Justin

#18

Michael Paquier

michael@paquier.xyz

about 4 years ago

In reply to: Justin Pryzby (#17)

Re: CLUSTER on partitioned index

On Thu, Sep 23, 2021 at 06:56:26PM -0500, Justin Pryzby wrote:

On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature. My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

My goal is to present a minimal patch and avoid any nonessential complexity.

FWIW, my opinion on the matter is similar to Alvaro's, and an extra
read of the patch gives me the same impression. Let's see if others
have an opinion on the matter.

Thanks for looking. I'm going to see about updating comments based on
corresponding parts of vacuum and on this message itself.

It doesn't feel right to just discard the patch at this stage, and it
needs an update, so I have moved it to the next CF for now, waiting on
author. If this does not really move on, my suggestion is to discard
the patch at the end of next CF, aka 2022-01.
--
Michael

#19

Justin Pryzby

pryzby@telsasoft.com

about 4 years ago

In reply to: Michael Paquier (#18)

1 attachment(s)

Re: CLUSTER on partitioned index

On Fri, Dec 03, 2021 at 10:16:24AM +0900, Michael Paquier wrote:

On Thu, Sep 23, 2021 at 06:56:26PM -0500, Justin Pryzby wrote:

On Thu, Sep 23, 2021 at 08:18:41PM +0200, Matthias van de Meent wrote:

Note: The following review is based on the assumption that this v11
revision was meant to contain only one patch. I put this up as a note,
because it seemed quite limited when compared to earlier versions of
the patchset.

Alvaro's critique was that the patchset was too complicated for what was
claimed to be a marginal feature. My response was to rearrange the patchset to
its minimal form, supporting CLUSTER without marking the index as clustered.

My goal is to present a minimal patch and avoid any nonessential complexity.

FWIW, my opinion on the matter is similar to Alvaro's, and an extra
read of the patch gives me the same impression. Let's see if others
have an opinion on the matter.

You and Alvaro thought the patch was too complicated for its value, so I
reduced the scope to its essential form.

CLUSTER was claimed to be of marginal utility, since the table is locked.
But locking one partition at a time would be less disruptive than locking an
equivalent non-partitioned table.

There's only about a dozen, other remaining restrictions/limitations on
partitioned tables, (AMs, triggers, identity, generated, exclusion, CIC,
FREEZE).

Since it's supported to VACUUM (including VACUUM FULL) and REINDEX a
partitioned table, I'm still suprised there's much hesitation to support
CLUSTER (which is used by vacuum full).

Thanks for looking. I'm going to see about updating comments based on
corresponding parts of vacuum and on this message itself.

It doesn't feel right to just discard the patch at this stage, and it
needs an update, so I have moved it to the next CF for now, waiting on
author. If this does not really move on, my suggestion is to discard
the patch at the end of next CF, aka 2022-01.

This includes minor updates based on Mathias review (commit message and test
case).

--
Justin

Attachments:

v11-0001-Implement-CLUSTER-of-partitioned-table.patchtext/x-diff; charset=us-asciiDownload

From cbeb98a016a05738007c98ef0a3828bdedf83d24 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sun, 7 Jun 2020 16:58:42 -0500
Subject: [PATCH v11] Implement CLUSTER of partitioned table..

VACUUM (including vacuum full) has recursed into partitioned tables since
partitioning were introduced in v10 (3c3bb9933).  See expand_vacuum_rel().

For VACUUM FULL, vacuum_rel() calls cluster_rel() for each partition.
This patch is to make cluster() do all the same stuff before calling
cluster_rel().

For now, partitioned indexes cannot be marked clustered, so clustering requires
specification of a partitioned index on the partitioned table.

See also a556549d7 and 19de0ab23.
---
 doc/src/sgml/ref/cluster.sgml         |   6 +
 src/backend/commands/cluster.c        | 177 ++++++++++++++++++++------
 src/bin/psql/tab-complete.c           |   1 +
 src/include/commands/cluster.h        |   1 +
 src/test/regress/expected/cluster.out |  45 ++++++-
 src/test/regress/sql/cluster.sql      |  21 ++-
 6 files changed, 204 insertions(+), 47 deletions(-)

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 86f5fdc469b..b3463ae5c46 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -196,6 +196,12 @@ CLUSTER [VERBOSE]
     in the <structname>pg_stat_progress_cluster</structname> view. See
     <xref linkend="cluster-progress-reporting"/> for details.
   </para>
+
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    partition of the specified partitioned index (which must be specified).
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 61853e6dec4..196c7f7eeb2 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+		Oid indexOid);
+static void cluster_multiple_rels(List *rvs, int options);
 
 
 /*---------------------------------------------------------------------------
@@ -131,6 +136,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		Relation	rel;
 
 		/* Find, lock, and check permissions on the table */
+		/* Obtain AEL now to avoid lock-upgrade hazard in the single-transaction case */
 		tableOid = RangeVarGetRelidExtended(stmt->relation,
 											AccessExclusiveLock,
 											0,
@@ -146,14 +152,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -188,11 +186,50 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 								stmt->indexname, stmt->relation->relname)));
 		}
 
-		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* close relation, keep lock till commit */
+			table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+		}
+		else
+		{
+			List	   *rvs;
+			MemoryContext cluster_context;
+
+			/*
+			 * Expand partitioned relations for CLUSTER (the corresponding
+			 * thing for VACUUM FULL happens in and around expand_vacuum_rel()
+			 */
+
+			/* Refuse to hold strong locks in a user transaction */
+			PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+			cluster_context = AllocSetContextCreate(PortalContext,
+												"Cluster",
+												ALLOCSET_DEFAULT_SIZES);
+
+			rvs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+
+			/*
+			 * For now, partitioned indexes are not actually marked clustered.
+			 */
+			check_index_is_clusterable(rel, indexOid, true, AccessShareLock);
+
+			/* close relation, releasing lock on parent table */
+			table_close(rel, AccessExclusiveLock);
+
+			/* Do the job. */
+			cluster_multiple_rels(rvs, params.options);
+
+			/* Start a new transaction for the cleanup work. */
+			StartTransactionCommand();
+
+			/* Clean up working storage */
+			MemoryContextDelete(cluster_context);
+		}
 	}
 	else
 	{
@@ -202,7 +239,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 */
 		MemoryContext cluster_context;
 		List	   *rvs;
-		ListCell   *rv;
 
 		/*
 		 * We cannot run this form of CLUSTER inside a user transaction block;
@@ -225,28 +261,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 		 * cluster_context.
 		 */
 		rvs = get_tables_to_cluster(cluster_context);
-
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
-
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
-
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+		cluster_multiple_rels(rvs, params.options | CLUOPT_RECHECK_ISCLUSTERED);
 
 		/* Start a new transaction for the cleanup work. */
 		StartTransactionCommand();
@@ -327,10 +342,11 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		/*
 		 * Silently skip a temp table for a remote session.  Only doing this
 		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER), because there is
-		 * another check in cluster() which will stop any attempt to cluster
-		 * remote temp tables by name.  There is another check in cluster_rel
-		 * which is redundant, but we leave it for extra safety.
+		 * somebody is executing a database-wide CLUSTER or on a partitioned
+		 * table), because there is another check in cluster() which will stop
+		 * any attempt to cluster remote temp tables by name.  There is another
+		 * check in cluster_rel which is redundant, but we leave it for extra
+		 * safety.
 		 */
 		if (RELATION_IS_OTHER_TEMP(OldHeap))
 		{
@@ -352,9 +368,10 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set, if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+					!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -415,6 +432,9 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	// Should silently skip this rather than assert ?
+	Assert(RELKIND_HAS_STORAGE(OldHeap->rd_rel->relkind));
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -585,8 +605,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
 	if (OidIsValid(indexOid))
+		/* Mark the correct index as clustered */
 		mark_index_clustered(OldHeap, indexOid, true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1604,3 +1624,76 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	return rvs;
 }
+
+/*
+ * Return a List of tables and their associated indexes, where each index is a
+ * partition of the given index
+ * We're called with a lock held on the parent table.
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List		*inhoids;
+	ListCell	*lc;
+	List		*rvs = NIL;
+	MemoryContext	old_context;
+
+	/* Do not lock the children until they're processed. */
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	/*
+	 * We have to build the list in a different memory context so it will
+	 * survive the cross-transaction processing
+	 */
+	old_context = MemoryContextSwitchTo(cluster_context);
+
+	foreach(lc, inhoids)
+	{
+		Oid		indexrelid = lfirst_oid(lc);
+		Oid		relid = IndexGetRelation(indexrelid, false);
+		RelToCluster	*rvtc;
+
+		if (!RELKIND_HAS_STORAGE(get_rel_relkind(indexrelid)))
+			continue;
+
+		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rvtc->tableOid = relid;
+		rvtc->indexOid = indexrelid;
+		rvs = lappend(rvs, rvtc);
+	}
+
+	MemoryContextSwitchTo(old_context);
+	return rvs;
+}
+
+/* Cluster each relation in a separate transaction */
+static void
+cluster_multiple_rels(List *rvs, int options)
+{
+	ListCell *lc;
+
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
+
+	/* Ok, now that we've got them all, cluster them one by one */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
+		ClusterParams cluster_params = { .options = options, };
+
+		/* Start a new transaction for each relation. */
+		StartTransactionCommand();
+
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_params.options |= CLUOPT_RECHECK;
+		cluster_rel(rvtc->tableOid, rvtc->indexOid,
+					&cluster_params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
+	}
+}
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 39be6f556a8..9944f21f71c 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -599,6 +599,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 3db375d7cc7..0605b053b61 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -22,6 +22,7 @@
 /* flag bits for ClusterParams->flags */
 #define CLUOPT_RECHECK 0x01		/* recheck relation state */
 #define CLUOPT_VERBOSE 0x02		/* print progress info */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04	/* recheck relation state for indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e46a66952f0..3f2758d13f6 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -444,13 +444,52 @@ DROP TABLE clustertest;
 CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+CLUSTER clstrpart USING clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- Partitioned indexes aren't and can't be marked un/clustered:
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+CLUSTER clstrpart;
+ERROR:  there is no previously clustered index for table "clstrpart"
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
-CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
 DROP TABLE clstrpart;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index aee9cf83e04..74118993a82 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -202,11 +202,28 @@ CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+-- Partitioned indexes aren't and can't be marked un/clustered:
+\d clstrpart
+CLUSTER clstrpart;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 DROP TABLE clstrpart;
 
 -- Test CLUSTER with external tuplesorting
-- 
2.17.1

#20

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Alvaro Herrera (#12)

Re: CLUSTER on partitioned index

On Tue, Jul 20, 2021 at 08:27:02PM -0400, Alvaro Herrera wrote:

I have to wonder if there really *is* a use case for CLUSTER in the
first place on regular tables, let alone on partitioned tables, which
are likely to be large and thus take a lot of time. What justifies
spending so much time on this implementation? My impression is that
CLUSTER is pretty much a fringe command nowadays, because of the access
exclusive lock required.

Does anybody actually use it?

I hope that Alvaro will comment on the simplified patches. If multiple people
think the patch isn't worth it, feel free to close it. But I don't see how
complexity could be the reason.

--
Justin

#21

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Justin Pryzby (#20)

1 attachment(s)

Re: CLUSTER on partitioned index

On 2022-Feb-23, Justin Pryzby wrote:

I hope that Alvaro will comment on the simplified patches. If multiple people
think the patch isn't worth it, feel free to close it. But I don't see how
complexity could be the reason.

I gave your patch a look and it seems a reasonable thing to do. Maybe
not terribly useful in most cases, but there may be some cases for which
it is. I found some part of it a bit repetitive, so I moved things
around a bit. What do think about this?

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

Attachments:

clustering-partitioned-tables.patchtext/x-diff; charset=utf-8Download

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index 86f5fdc469..b3463ae5c4 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -196,6 +196,12 @@ CLUSTER [VERBOSE]
     in the <structname>pg_stat_progress_cluster</structname> view. See
     <xref linkend="cluster-progress-reporting"/> for details.
   </para>
+
+   <para>
+    Clustering a partitioned table clusters each of its partitions using the
+    partition of the specified partitioned index (which must be specified).
+   </para>
+
  </refsect1>
 
  <refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 02a7e94bf9..8417cbdb67 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -32,7 +32,9 @@
 #include "catalog/index.h"
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
+#include "catalog/partition.h"
 #include "catalog/pg_am.h"
+#include "catalog/pg_inherits.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
 #include "commands/defrem.h"
@@ -73,6 +75,9 @@ static void copy_table_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex,
 							bool verbose, bool *pSwapToastByContent,
 							TransactionId *pFreezeXid, MultiXactId *pCutoffMulti);
 static List *get_tables_to_cluster(MemoryContext cluster_context);
+static List *get_tables_to_cluster_partitioned(MemoryContext cluster_context,
+											   Oid indexOid);
+static void cluster_multiple_rels(List *rvs, ClusterParams *params);
 
 
 /*---------------------------------------------------------------------------
@@ -105,6 +110,10 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	ListCell   *lc;
 	ClusterParams params = {0};
 	bool		verbose = false;
+	Relation	rel = NULL;
+	Oid			indexOid = InvalidOid;
+	MemoryContext cluster_context;
+	List	   *rtcs;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -126,11 +135,13 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	if (stmt->relation != NULL)
 	{
 		/* This is the single-relation case. */
-		Oid			tableOid,
-					indexOid = InvalidOid;
-		Relation	rel;
+		Oid			tableOid;
 
-		/* Find, lock, and check permissions on the table */
+		/*
+		 * Find, lock, and check permissions on the table.  We obtain
+		 * AccessExclusiveLock right away to avoid lock-upgrade hazard in the
+		 * single-transaction case.
+		 */
 		tableOid = RangeVarGetRelidExtended(stmt->relation,
 											AccessExclusiveLock,
 											0,
@@ -146,14 +157,6 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("cannot cluster temporary tables of other sessions")));
 
-		/*
-		 * Reject clustering a partitioned table.
-		 */
-		if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
-			ereport(ERROR,
-					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
-					 errmsg("cannot cluster a partitioned table")));
-
 		if (stmt->indexname == NULL)
 		{
 			ListCell   *index;
@@ -188,71 +191,100 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 								stmt->indexname, stmt->relation->relname)));
 		}
 
-		/* close relation, keep lock till commit */
-		table_close(rel, NoLock);
+		if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE)
+		{
+			/* close relation, keep lock till commit */
+			table_close(rel, NoLock);
 
-		/* Do the job. */
-		cluster_rel(tableOid, indexOid, &params);
+			/* Do the job. */
+			cluster_rel(tableOid, indexOid, &params);
+
+			return;
+		}
+	}
+
+	/*
+	 * By here, we know we are in a multi-table situation.  In order to avoid
+	 * holding locks for too long, we want to process each table in its own
+	 * transaction.  This forces us to disallow running inside a user
+	 * transaction block.
+	 */
+	PreventInTransactionBlock(isTopLevel, "CLUSTER");
+
+	/* Also, we need a memory context to hold our list of relations */
+	cluster_context = AllocSetContextCreate(PortalContext,
+											"Cluster",
+											ALLOCSET_DEFAULT_SIZES);
+
+	/*
+	 * Either we're processing a partitioned table, or we were not given any
+	 * table name at all.  In either case, obtain a list of relations to
+	 * process.
+	 *
+	 * In the former case, an index name must have been given, so we don't
+	 * need to recheck its "indisclustered" bit, but we have to check that it
+	 * is an index that we can cluster on.  In the latter case, we set the
+	 * option bit to have indisclustered verified.
+	 *
+	 * Rechecking the relation itself is necessary here in all cases.
+	 */
+	params.options |= CLUOPT_RECHECK;
+	if (rel != NULL)
+	{
+		check_index_is_clusterable(rel, indexOid, true, AccessShareLock);
+		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
+
+		/* close relation, releasing lock on parent table */
+		table_close(rel, AccessExclusiveLock);
 	}
 	else
 	{
-		/*
-		 * This is the "multi relation" case. We need to cluster all tables
-		 * that have some index with indisclustered set.
-		 */
-		MemoryContext cluster_context;
-		List	   *rvs;
-		ListCell   *rv;
+		rtcs = get_tables_to_cluster(cluster_context);
+		params.options |= CLUOPT_RECHECK_ISCLUSTERED;
+	}
 
-		/*
-		 * We cannot run this form of CLUSTER inside a user transaction block;
-		 * we'd be holding locks way too long.
-		 */
-		PreventInTransactionBlock(isTopLevel, "CLUSTER");
+	/* Do the job. */
+	cluster_multiple_rels(rtcs, &params);
 
-		/*
-		 * Create special memory context for cross-transaction storage.
-		 *
-		 * Since it is a child of PortalContext, it will go away even in case
-		 * of error.
-		 */
-		cluster_context = AllocSetContextCreate(PortalContext,
-												"Cluster",
-												ALLOCSET_DEFAULT_SIZES);
+	/* Start a new transaction for the cleanup work. */
+	StartTransactionCommand();
 
-		/*
-		 * Build the list of relations to cluster.  Note that this lives in
-		 * cluster_context.
-		 */
-		rvs = get_tables_to_cluster(cluster_context);
+	/* Clean up working storage */
+	MemoryContextDelete(cluster_context);
+}
 
-		/* Commit to get out of starting transaction */
-		PopActiveSnapshot();
-		CommitTransactionCommand();
+/*
+ * Given a list of relations to cluster, process each of them in a separate
+ * transaction.
+ *
+ * We expect to be in a transaction at start, but there isn't one when we
+ * return.
+ */
+static void
+cluster_multiple_rels(List *rvs, ClusterParams *params)
+{
+	ListCell   *lc;
 
-		/* Ok, now that we've got them all, cluster them one by one */
-		foreach(rv, rvs)
-		{
-			RelToCluster *rvtc = (RelToCluster *) lfirst(rv);
-			ClusterParams cluster_params = params;
+	/* Commit to get out of starting transaction */
+	PopActiveSnapshot();
+	CommitTransactionCommand();
 
-			/* Start a new transaction for each relation. */
-			StartTransactionCommand();
-			/* functions in indexes may want a snapshot set */
-			PushActiveSnapshot(GetTransactionSnapshot());
-			/* Do the job. */
-			cluster_params.options |= CLUOPT_RECHECK;
-			cluster_rel(rvtc->tableOid, rvtc->indexOid,
-						&cluster_params);
-			PopActiveSnapshot();
-			CommitTransactionCommand();
-		}
+	/* Cluster the tables, each in a separate transaction */
+	foreach(lc, rvs)
+	{
+		RelToCluster *rvtc = (RelToCluster *) lfirst(lc);
 
-		/* Start a new transaction for the cleanup work. */
+		/* Start a new transaction for each relation. */
 		StartTransactionCommand();
 
-		/* Clean up working storage */
-		MemoryContextDelete(cluster_context);
+		/* functions in indexes may want a snapshot set */
+		PushActiveSnapshot(GetTransactionSnapshot());
+
+		/* Do the job. */
+		cluster_rel(rvtc->tableOid, rvtc->indexOid, params);
+
+		PopActiveSnapshot();
+		CommitTransactionCommand();
 	}
 }
 
@@ -327,10 +359,11 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		/*
 		 * Silently skip a temp table for a remote session.  Only doing this
 		 * check in the "recheck" case is appropriate (which currently means
-		 * somebody is executing a database-wide CLUSTER), because there is
-		 * another check in cluster() which will stop any attempt to cluster
-		 * remote temp tables by name.  There is another check in cluster_rel
-		 * which is redundant, but we leave it for extra safety.
+		 * somebody is executing a database-wide CLUSTER or on a partitioned
+		 * table), because there is another check in cluster() which will stop
+		 * any attempt to cluster remote temp tables by name.  There is
+		 * another check in cluster_rel which is redundant, but we leave it
+		 * for extra safety.
 		 */
 		if (RELATION_IS_OTHER_TEMP(OldHeap))
 		{
@@ -352,9 +385,11 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 			}
 
 			/*
-			 * Check that the index is still the one with indisclustered set.
+			 * Check that the index is still the one with indisclustered set,
+			 * if needed.
 			 */
-			if (!get_index_isclustered(indexOid))
+			if ((params->options & CLUOPT_RECHECK_ISCLUSTERED) != 0 &&
+				!get_index_isclustered(indexOid))
 			{
 				relation_close(OldHeap, AccessExclusiveLock);
 				pgstat_progress_end_command();
@@ -415,6 +450,9 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 		return;
 	}
 
+	Assert(OldHeap->rd_rel->relkind == RELKIND_RELATION ||
+		   OldHeap->rd_rel->relkind == RELKIND_MATVIEW);
+
 	/*
 	 * All predicate locks on the tuples or pages are about to be made
 	 * invalid, because we move tuples around.  Promote them to relation
@@ -585,8 +623,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	TransactionId frozenXid;
 	MultiXactId cutoffMulti;
 
-	/* Mark the correct index as clustered */
 	if (OidIsValid(indexOid))
+		/* Mark the correct index as clustered */
 		mark_index_clustered(OldHeap, indexOid, true);
 
 	/* Remember info about rel before closing OldHeap */
@@ -1528,8 +1566,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 
 			/*
 			 * Reset the relrewrite for the toast. The command-counter
-			 * increment is required here as we are about to update
-			 * the tuple that is updated as part of RenameRelationInternal.
+			 * increment is required here as we are about to update the tuple
+			 * that is updated as part of RenameRelationInternal.
 			 */
 			CommandCounterIncrement();
 			ResetRelRewrite(newrel->rd_rel->reltoastrelid);
@@ -1564,8 +1602,7 @@ get_tables_to_cluster(MemoryContext cluster_context)
 	HeapTuple	indexTuple;
 	Form_pg_index index;
 	MemoryContext old_context;
-	RelToCluster *rvtc;
-	List	   *rvs = NIL;
+	List	   *rtcs = NIL;
 
 	/*
 	 * Get all indexes that have indisclustered set and are owned by
@@ -1579,21 +1616,20 @@ get_tables_to_cluster(MemoryContext cluster_context)
 	scan = table_beginscan_catalog(indRelation, 1, &entry);
 	while ((indexTuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 	{
+		RelToCluster *rtc;
+
 		index = (Form_pg_index) GETSTRUCT(indexTuple);
 
 		if (!pg_class_ownercheck(index->indrelid, GetUserId()))
 			continue;
 
-		/*
-		 * We have to build the list in a different memory context so it will
-		 * survive the cross-transaction processing
-		 */
+		/* Use a permanent memory context for the result list */
 		old_context = MemoryContextSwitchTo(cluster_context);
 
-		rvtc = (RelToCluster *) palloc(sizeof(RelToCluster));
-		rvtc->tableOid = index->indrelid;
-		rvtc->indexOid = index->indexrelid;
-		rvs = lappend(rvs, rvtc);
+		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rtc->tableOid = index->indrelid;
+		rtc->indexOid = index->indexrelid;
+		rtcs = lappend(rtcs, rtc);
 
 		MemoryContextSwitchTo(old_context);
 	}
@@ -1601,5 +1637,45 @@ get_tables_to_cluster(MemoryContext cluster_context)
 
 	relation_close(indRelation, AccessShareLock);
 
-	return rvs;
+	return rtcs;
+}
+
+/*
+ * Given an index on a partitioned table, return a list of RelToCluster for
+ * all the children leaves tables/indexes.
+ *
+ * Caller must hold lock on the table containing the index.
+ */
+static List *
+get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
+{
+	List	   *inhoids;
+	ListCell   *lc;
+	List	   *rtcs = NIL;
+	MemoryContext old_context;
+
+	/* Do not lock the children until they're processed */
+	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
+
+	/* Use a permanent memory context for the result list */
+	old_context = MemoryContextSwitchTo(cluster_context);
+
+	foreach(lc, inhoids)
+	{
+		Oid			indexrelid = lfirst_oid(lc);
+		Oid			relid = IndexGetRelation(indexrelid, false);
+		RelToCluster *rtc;
+
+		/* consider only leaf indexes */
+		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
+			continue;
+
+		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
+		rtc->tableOid = relid;
+		rtc->indexOid = indexrelid;
+		rtcs = lappend(rtcs, rtc);
+	}
+
+	MemoryContextSwitchTo(old_context);
+	return rtcs;
 }
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 51b4a00d50..7febb5018f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -16911,11 +16911,11 @@ AtEOSubXact_on_commit_actions(bool isCommit, SubTransactionId mySubid,
 
 /*
  * This is intended as a callback for RangeVarGetRelidExtended().  It allows
- * the relation to be locked only if (1) it's a plain table, materialized
- * view, or TOAST table and (2) the current user is the owner (or the
- * superuser).  This meets the permission-checking needs of CLUSTER, REINDEX
- * TABLE, and REFRESH MATERIALIZED VIEW; we expose it here so that it can be
- * used by all.
+ * the relation to be locked only if (1) it's a plain or partitioned table,
+ * materialized view, or TOAST table and (2) the current user is the owner (or
+ * the superuser).  This meets the permission-checking needs of CLUSTER,
+ * REINDEX TABLE, and REFRESH MATERIALIZED VIEW; we expose it here so that it
+ * can be used by all.
  */
 void
 RangeVarCallbackOwnsTable(const RangeVar *relation,
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 3f9dfffd57..6f040674a2 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -897,6 +897,7 @@ static const SchemaQuery Query_for_list_of_clusterables = {
 	.catname = "pg_catalog.pg_class c",
 	.selcondition =
 	"c.relkind IN (" CppAsString2(RELKIND_RELATION) ", "
+	CppAsString2(RELKIND_PARTITIONED_TABLE) ", "
 	CppAsString2(RELKIND_MATVIEW) ")",
 	.viscondition = "pg_catalog.pg_table_is_visible(c.oid)",
 	.namespace = "c.relnamespace",
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 3db375d7cc..3c279f6210 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -19,9 +19,11 @@
 #include "utils/relcache.h"
 
 
-/* flag bits for ClusterParams->flags */
-#define CLUOPT_RECHECK 0x01		/* recheck relation state */
-#define CLUOPT_VERBOSE 0x02		/* print progress info */
+/* flag bits for ClusterParams->options */
+#define CLUOPT_VERBOSE 0x01		/* print progress info */
+#define CLUOPT_RECHECK 0x02		/* recheck relation state */
+#define CLUOPT_RECHECK_ISCLUSTERED 0x04 /* recheck relation state for
+										 * indisclustered */
 
 /* options for CLUSTER */
 typedef struct ClusterParams
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index e46a66952f..3f2758d13f 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -444,13 +444,52 @@ DROP TABLE clustertest;
 CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+ERROR:  cannot cluster on invalid index "clstrpart_only_idx"
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+CLUSTER clstrpart USING clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+   relname   | level | relkind | ?column? 
+-------------+-------+---------+----------
+ clstrpart   |     0 | p       | t
+ clstrpart1  |     1 | p       | t
+ clstrpart11 |     2 | r       | f
+ clstrpart12 |     2 | p       | t
+ clstrpart2  |     1 | r       | f
+ clstrpart3  |     1 | p       | t
+ clstrpart33 |     2 | r       | f
+(7 rows)
+
+-- Partitioned indexes aren't and can't be marked un/clustered:
+\d clstrpart
+       Partitioned table "public.clstrpart"
+ Column |  Type   | Collation | Nullable | Default 
+--------+---------+-----------+----------+---------
+ a      | integer |           |          | 
+Partition key: RANGE (a)
+Indexes:
+    "clstrpart_idx" btree (a)
+Number of partitions: 3 (Use \d+ to list them.)
+
+CLUSTER clstrpart;
+ERROR:  there is no previously clustered index for table "clstrpart"
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
-CLUSTER clstrpart USING clstrpart_idx;
-ERROR:  cannot cluster a partitioned table
 DROP TABLE clstrpart;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index aee9cf83e0..74118993a8 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -202,11 +202,28 @@ CREATE TABLE clustertest (f1 int PRIMARY KEY);
 CLUSTER clustertest USING clustertest_pkey;
 CLUSTER clustertest;
 
--- Check that partitioned tables cannot be clustered
+-- Check that partitioned tables can be clustered
 CREATE TABLE clstrpart (a int) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart1 PARTITION OF clstrpart FOR VALUES FROM (1)TO(10) PARTITION BY RANGE (a);
+CREATE TABLE clstrpart11 PARTITION OF clstrpart1 FOR VALUES FROM (1)TO(10);
+CREATE TABLE clstrpart12 PARTITION OF clstrpart1 FOR VALUES FROM (10)TO(20) PARTITION BY RANGE(a);
+CREATE TABLE clstrpart2 PARTITION OF clstrpart FOR VALUES FROM (20)TO(30);
+CREATE TABLE clstrpart3 PARTITION OF clstrpart DEFAULT PARTITION BY RANGE(a);
+CREATE TABLE clstrpart33 PARTITION OF clstrpart3 DEFAULT;
+CREATE INDEX clstrpart_only_idx ON ONLY clstrpart (a);
+CLUSTER clstrpart USING clstrpart_only_idx; -- fails
+DROP INDEX clstrpart_only_idx;
 CREATE INDEX clstrpart_idx ON clstrpart (a);
-ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
+-- Check that clustering sets new relfilenodes:
+CREATE TEMP TABLE old_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
 CLUSTER clstrpart USING clstrpart_idx;
+CREATE TEMP TABLE new_cluster_info AS SELECT relname, level, relfilenode, relkind FROM pg_partition_tree('clstrpart'::regclass) AS tree JOIN pg_class c ON c.oid=tree.relid ;
+SELECT relname, old.level, old.relkind, old.relfilenode = new.relfilenode FROM old_cluster_info AS old JOIN new_cluster_info AS new USING (relname) ORDER BY relname COLLATE "C";
+-- Partitioned indexes aren't and can't be marked un/clustered:
+\d clstrpart
+CLUSTER clstrpart;
+ALTER TABLE clstrpart SET WITHOUT CLUSTER;
+ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 DROP TABLE clstrpart;
 
 -- Test CLUSTER with external tuplesorting

#22

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Alvaro Herrera (#21)

Re: CLUSTER on partitioned index

I realized after posting that we used to allow clustering toast tables,
but after my changes we no longer do. (Justin's version had a
RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I
don't know why we allowed it and I don't know of anyone who has ever
used that feature and we don't have any test coverage for it, but I
don't have any reason to explicitly disallow it either. So I propose to
continue to allow it:

From 05ba6124422fb7c2fd19575e905e444ba3eef1e5 Mon Sep 17 00:00:00 2001

From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Thu, 31 Mar 2022 12:49:57 +0200
Subject: [PATCH] allow to cluster toast tables

---
src/backend/commands/cluster.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8417cbdb67..b391d7c434 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -451,7 +451,8 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 	}

 	Assert(OldHeap->rd_rel->relkind == RELKIND_RELATION ||
-		   OldHeap->rd_rel->relkind == RELKIND_MATVIEW);
+		   OldHeap->rd_rel->relkind == RELKIND_MATVIEW ||
+		   OldHeap->rd_rel->relkind == RELKIND_TOASTVALUE);

/*
* All predicate locks on the tuples or pages are about to be made
--
2.30.2

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Java is clearly an example of money oriented programming" (A. Stepanov)

#23

Robert Haas

robertmhaas@gmail.com

almost 4 years ago

In reply to: Alvaro Herrera (#22)

Re: CLUSTER on partitioned index

On Thu, Mar 31, 2022 at 6:54 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

I realized after posting that we used to allow clustering toast tables,
but after my changes we no longer do. (Justin's version had a
RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I
don't know why we allowed it and I don't know of anyone who has ever
used that feature and we don't have any test coverage for it, but I
don't have any reason to explicitly disallow it either. So I propose to
continue to allow it:

I think that's probably a good decision. It's certainly useful to have
a way to force a rewrite of a TOAST table, although a lot of people
who would benefit from that operation probably don't know that they
need it, or don't know that they need just that, and end up rewriting
both the main table and the TOAST table. Whether it's useful to be
able to run CLUSTER specifically rather than VACUUM FULL on the TOAST
table is less clear, but I don't think we're likely to save anything
by forbidding it. Maybe we should consider adding a test, though.

--
Robert Haas
EDB: http://www.enterprisedb.com

#24

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Alvaro Herrera (#21)

Re: CLUSTER on partitioned index

On Wed, Mar 30, 2022 at 10:51:43PM +0200, Alvaro Herrera wrote:

On 2022-Feb-23, Justin Pryzby wrote:

I hope that Alvaro will comment on the simplified patches. If multiple people
think the patch isn't worth it, feel free to close it. But I don't see how
complexity could be the reason.

I gave your patch a look and it seems a reasonable thing to do. Maybe
not terribly useful in most cases, but there may be some cases for which
it is. I found some part of it a bit repetitive, so I moved things
around a bit. What do think about this?

Thanks for looking at it.

The changes to finish_heap_swap() and get_tables_to_cluster() are superfluous,
right ?

I think this comment is worth preserving (it'd be okay if it lived in the
commit message).
- * Expand partitioned relations for CLUSTER (the corresponding
- * thing for VACUUM FULL happens in and around expand_vacuum_rel()

+ if (rel != NULL) In this case, maybe it should Assert() that it's
relkind=p (mostly for purposes of self-documentation).

+ partition of the specified partitioned index (which must be specified).
This is my own language, but now seems repetitive. I think the parenthetical
part should be a separate sentance: "For partitioned indexes, the index may not
be omitted.".

Otherwise looks ok.

diff --git a/doc/src/sgml/ref/cluster.sgml b/doc/src/sgml/ref/cluster.sgml
index b3463ae5c46..fbc090cd0b0 100644
--- a/doc/src/sgml/ref/cluster.sgml
+++ b/doc/src/sgml/ref/cluster.sgml
@@ -199,7 +199,8 @@ CLUSTER [VERBOSE]

    <para>
     Clustering a partitioned table clusters each of its partitions using the
-    partition of the specified partitioned index (which must be specified).
+    partition of the specified partitioned index.  When clustering a
+    partitioned table, the index may not be omitted.
    </para>

  </refsect1>
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 8417cbdb67f..412147f05bc 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -231,6 +231,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	params.options |= CLUOPT_RECHECK;
 	if (rel != NULL)
 	{
+		Assert (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
 		check_index_is_clusterable(rel, indexOid, true, AccessShareLock);
 		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);

@@ -451,6 +452,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
}

Assert(OldHeap->rd_rel->relkind == RELKIND_RELATION ||
+ OldHeap->rd_rel->relkind == RELKIND_TOASTVALUE ||
OldHeap->rd_rel->relkind == RELKIND_MATVIEW);

 	/*
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 3f2758d13f6..6cf18c8d321 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -305,6 +305,8 @@ WHERE pg_class.oid=indexrelid
 ---------
 (0 rows)

+-- Verify that toast is clusterable
+CLUSTER pg_toast.pg_toast_826 USING pg_toast_826_index;
 -- Verify that clustering all tables does in fact cluster the right ones
 CREATE USER regress_clstr_user;
 CREATE TABLE clstr_1 (a INT PRIMARY KEY);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 74118993a82..ae27c35f65d 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -104,6 +104,9 @@ WHERE pg_class.oid=indexrelid
 	AND pg_class_2.relname = 'clstr_tst'
 	AND indisclustered;

+-- Verify that toast is clusterable
+CLUSTER pg_toast.pg_toast_826 USING pg_toast_826_index;
+
 -- Verify that clustering all tables does in fact cluster the right ones
 CREATE USER regress_clstr_user;
 CREATE TABLE clstr_1 (a INT PRIMARY KEY);

#25

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Alvaro Herrera (#22)

Re: CLUSTER on partitioned index

On Thu, Mar 31, 2022 at 12:54:36PM +0200, Alvaro Herrera wrote:

I realized after posting that we used to allow clustering toast tables,
but after my changes we no longer do. (Justin's version had a
RELKIND_HAS_STORAGE test here instead, which seemed a little too lax.) I
don't know why we allowed it and I don't know of anyone who has ever
used that feature and we don't have any test coverage for it, but I
don't have any reason to explicitly disallow it either. So I propose to
continue to allow it:

Good catch.

My daily vacuum script would've discovered that they're no longer supported, as
it tests for (among other things) c.relkind IN ('r','t'). That clusters tables
that have an indisclustered set and vacuums various others. (BTW, it's the
same script that discovered in 2019 that clustering on expressional indexes had
been broken by the heapam changes).

I think the response should be to add a test case, which could be 0001 or
00099.

#26

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Justin Pryzby (#24)

Re: CLUSTER on partitioned index

Thanks, pushed.

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
"La fuerza no está en los medios físicos
sino que reside en una voluntad indomable" (Gandhi)

#27

Alvaro Herrera

alvherre@alvh.no-ip.org

almost 4 years ago

In reply to: Alvaro Herrera (#26)

Re: CLUSTER on partitioned index

Small things here.

1. in VACUUM FULL we only process partitions that are owned by the
invoking user. We don't have this test in the new code. I'm not sure
why do we do that there; is it worth doing the same here?

2. We should silently skip a partition that's a foreign table, I
suppose.

3. We do mark the index on the partitions as indisclustered AFAICS (we
claim that the partitioned table's index is not marked, which is
accurate). So users doing unadorned CLUSTER afterwards will get the
partitions clustered too, once they cluster the partitioned table. If
they don't want this, they would have to ALTER TABLE to remove the
marking. How likely is that this will be a problem? Maybe documenting
this point is enough.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Voy a acabar con todos los humanos / con los humanos yo acabaré
voy a acabar con todos (bis) / con todos los humanos acabaré ¡acabaré! (Bender)

#28

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Alvaro Herrera (#26)

Re: CLUSTER on partitioned index

On Sat, Apr 02, 2022 at 07:11:47PM +0200, Alvaro Herrera wrote:

Thanks, pushed.

Thank you for revisiting it, and thanks to Zhihong Yu for earlier review.

I'll look into your outstanding questions later this week.

--
Justin

#29

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Alvaro Herrera (#27)

2 attachment(s)

Re: CLUSTER on partitioned index

On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:

Small things here.

1. in VACUUM FULL we only process partitions that are owned by the
invoking user. We don't have this test in the new code. I'm not sure
why do we do that there; is it worth doing the same here?

That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
itself). The reason was to avoid blocking if an unprivileged user runs VACUUM
FULL which would try to lock things (including shared catalogs) before checking
if they have permission to vacuum them. That commit also initially checks the
owner of the partitioned table, and then re-checking owner of partitions later
on.

A similar issue exists here. But 1) catalog tables are not partitioned, and,
2) ownership of a partitioned table is checked immediately. So the problem can
only occur if a user who owns a partitioned table but doesn't own all its
partitions tries to cluster it, and it blocks behind another session. Fixing
this is probably a good idea, but seems improbable that it would avoid a DOS.

2. We should silently skip a partition that's a foreign table, I
suppose.

I think it's not needed, since the loop over index children doesn't see a child
index on the foreign table. ?

3. We do mark the index on the partitions as indisclustered AFAICS (we
claim that the partitioned table's index is not marked, which is
accurate). So users doing unadorned CLUSTER afterwards will get the
partitions clustered too, once they cluster the partitioned table. If
they don't want this, they would have to ALTER TABLE to remove the
marking. How likely is that this will be a problem? Maybe documenting
this point is enough.

It seems at least as likely that someone would *want* the partitions to be
marked clustered as that someone would want them to be unchanged.

The cluster mark accurately reflects having been clustered. It seems unlikely
that a user would want something else to be clustered later by "cluster;".
Since clustering on a partitioned table wasn't supported before, nothing weird
will happen to someone who upgrades to v15 unless they elect to use the new
feature. As this seems to be POLA, it doesn't even need to be documented. ?

Attachments:

v13-0001-Recheck-arg-is-not-needed-since-2011.patchtext/x-diff; charset=us-asciiDownload

From 19c02209dfbcbf73494e1e6d3ca0db50f64dc5fd Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Fri, 24 Sep 2021 14:18:52 -0500
Subject: [PATCH v13 1/2] Recheck arg is not needed since 2011

It was added at: a4dde3bff36dac1ac0b699becad6f103d861a874
And not used since: 7e2f906201c8bb95f7fb17e56b8740c38bda5441
---
 src/backend/commands/cluster.c   | 6 +++---
 src/backend/commands/tablecmds.c | 2 +-
 src/include/commands/cluster.h   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 322d6bb2f18..0f0a6e9f018 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -232,7 +232,7 @@ cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel)
 	if (rel != NULL)
 	{
 		Assert(rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE);
-		check_index_is_clusterable(rel, indexOid, true, AccessShareLock);
+		check_index_is_clusterable(rel, indexOid, AccessShareLock);
 		rtcs = get_tables_to_cluster_partitioned(cluster_context, indexOid);
 
 		/* close relation, releasing lock on parent table */
@@ -434,7 +434,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
 
 	/* Check heap and index are valid to cluster on */
 	if (OidIsValid(indexOid))
-		check_index_is_clusterable(OldHeap, indexOid, recheck, AccessExclusiveLock);
+		check_index_is_clusterable(OldHeap, indexOid, AccessExclusiveLock);
 
 	/*
 	 * Quietly ignore the request if this is a materialized view which has not
@@ -480,7 +480,7 @@ cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params)
  * protection here.
  */
 void
-check_index_is_clusterable(Relation OldHeap, Oid indexOid, bool recheck, LOCKMODE lockmode)
+check_index_is_clusterable(Relation OldHeap, Oid indexOid, LOCKMODE lockmode)
 {
 	Relation	OldIndex;
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 90edd0bb97d..1d7db41d172 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14055,7 +14055,7 @@ ATExecClusterOn(Relation rel, const char *indexName, LOCKMODE lockmode)
 						indexName, RelationGetRelationName(rel))));
 
 	/* Check index is valid to cluster on */
-	check_index_is_clusterable(rel, indexOid, false, lockmode);
+	check_index_is_clusterable(rel, indexOid, lockmode);
 
 	/* And do the work */
 	mark_index_clustered(rel, indexOid, false);
diff --git a/src/include/commands/cluster.h b/src/include/commands/cluster.h
index 3c279f6210a..df8e73af409 100644
--- a/src/include/commands/cluster.h
+++ b/src/include/commands/cluster.h
@@ -34,7 +34,7 @@ typedef struct ClusterParams
 extern void cluster(ParseState *pstate, ClusterStmt *stmt, bool isTopLevel);
 extern void cluster_rel(Oid tableOid, Oid indexOid, ClusterParams *params);
 extern void check_index_is_clusterable(Relation OldHeap, Oid indexOid,
-									   bool recheck, LOCKMODE lockmode);
+									   LOCKMODE lockmode);
 extern void mark_index_clustered(Relation rel, Oid indexOid, bool is_internal);
 
 extern Oid	make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, Oid NewAccessMethod,
-- 
2.17.1

v13-0002-cluster-early-ownership-check-of-partitions.patchtext/x-diff; charset=us-asciiDownload

From 4fc03463d51cc55de676813a7a8ac7321ecc7be7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 6 Apr 2022 07:23:39 -0500
Subject: [PATCH v13 2/2] cluster: early ownership check of partitions

Similar to a556549d7, check the ownership of partitions before queueing them to
be clustered.  Otherwise, we may later wait on an exclusive lock on a partition
only for it to fails ownership check anyway, causing other queries to wait
behind cluster.
---
 src/backend/commands/cluster.c        | 19 +++++++++++++++----
 src/test/regress/expected/cluster.out | 21 +++++++++++++++++++++
 src/test/regress/sql/cluster.sql      | 15 +++++++++++++++
 3 files changed, 51 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 0f0a6e9f018..671ea3f45a2 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -1663,9 +1663,6 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 	/* Do not lock the children until they're processed */
 	inhoids = find_all_inheritors(indexOid, NoLock, NULL);
 
-	/* Use a permanent memory context for the result list */
-	old_context = MemoryContextSwitchTo(cluster_context);
-
 	foreach(lc, inhoids)
 	{
 		Oid			indexrelid = lfirst_oid(lc);
@@ -1676,12 +1673,26 @@ get_tables_to_cluster_partitioned(MemoryContext cluster_context, Oid indexOid)
 		if (get_rel_relkind(indexrelid) != RELKIND_INDEX)
 			continue;
 
+		/*
+		 * Silently skip partitions which the user has no access to.  They'll
+		 * be skipped later anyway, but this avoids blocking before the
+		 * permission check if another session is reading the table.
+		 */
+		if (!pg_class_ownercheck(relid, GetUserId()) &&
+				(!pg_database_ownercheck(MyDatabaseId, GetUserId()) ||
+				 IsSharedRelation(relid)))
+			continue;
+
+		/* Use a permanent memory context for the result list */
+		old_context = MemoryContextSwitchTo(cluster_context);
+
 		rtc = (RelToCluster *) palloc(sizeof(RelToCluster));
 		rtc->tableOid = relid;
 		rtc->indexOid = indexrelid;
 		rtcs = lappend(rtcs, rtc);
+
+		MemoryContextSwitchTo(old_context);
 	}
 
-	MemoryContextSwitchTo(old_context);
 	return rtcs;
 }
diff --git a/src/test/regress/expected/cluster.out b/src/test/regress/expected/cluster.out
index 953818c74e1..08e10d89dcf 100644
--- a/src/test/regress/expected/cluster.out
+++ b/src/test/regress/expected/cluster.out
@@ -493,6 +493,27 @@ ERROR:  cannot mark index clustered in partitioned table
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 ERROR:  cannot mark index clustered in partitioned table
 DROP TABLE clstrpart;
+-- Ownership of partitions is checked
+CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
+CREATE INDEX ptnowner_i_idx ON ptnowner(i);
+CREATE TABLE ptnowner1 PARTITION OF ptnowner FOR VALUES IN (1);
+CREATE ROLE ptnowner;
+CREATE TABLE ptnowner2 PARTITION OF ptnowner FOR VALUES IN (2);
+ALTER TABLE ptnowner1 OWNER TO ptnowner;
+ALTER TABLE ptnowner OWNER TO ptnowner;
+SET SESSION AUTHORIZATION ptnowner;
+CREATE TEMP TABLE ptnowner_oldnodes AS SELECT oid, relname, relfilenode FROM pg_partition_tree('ptnowner') AS tree JOIN pg_class AS c ON c.oid=tree.relid;
+CLUSTER ptnowner USING ptnowner_i_idx;
+SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
+  relname  | ?column? 
+-----------+----------
+ ptnowner  | t
+ ptnowner1 | f
+ ptnowner2 | t
+(3 rows)
+
+RESET SESSION AUTHORIZATION;
+DROP TABLE ptnowner;
 -- Test CLUSTER with external tuplesorting
 create table clstr_4 as select * from tenk1;
 create index cluster_sort on clstr_4 (hundred, thousand, tenthous);
diff --git a/src/test/regress/sql/cluster.sql b/src/test/regress/sql/cluster.sql
index 5601684ee3f..631eec1b739 100644
--- a/src/test/regress/sql/cluster.sql
+++ b/src/test/regress/sql/cluster.sql
@@ -229,6 +229,21 @@ ALTER TABLE clstrpart SET WITHOUT CLUSTER;
 ALTER TABLE clstrpart CLUSTER ON clstrpart_idx;
 DROP TABLE clstrpart;
 
+-- Ownership of partitions is checked
+CREATE TABLE ptnowner(i int unique) PARTITION BY LIST (i);
+CREATE INDEX ptnowner_i_idx ON ptnowner(i);
+CREATE TABLE ptnowner1 PARTITION OF ptnowner FOR VALUES IN (1);
+CREATE ROLE ptnowner;
+CREATE TABLE ptnowner2 PARTITION OF ptnowner FOR VALUES IN (2);
+ALTER TABLE ptnowner1 OWNER TO ptnowner;
+ALTER TABLE ptnowner OWNER TO ptnowner;
+SET SESSION AUTHORIZATION ptnowner;
+CREATE TEMP TABLE ptnowner_oldnodes AS SELECT oid, relname, relfilenode FROM pg_partition_tree('ptnowner') AS tree JOIN pg_class AS c ON c.oid=tree.relid;
+CLUSTER ptnowner USING ptnowner_i_idx;
+SELECT a.relname, a.relfilenode=b.relfilenode FROM pg_class a JOIN ptnowner_oldnodes b USING (oid) ORDER BY a.relname COLLATE "C";
+RESET SESSION AUTHORIZATION;
+DROP TABLE ptnowner;
+
 -- Test CLUSTER with external tuplesorting
 
 create table clstr_4 as select * from tenk1;
-- 
2.17.1

#30

Zhihong Yu

zyu@yugabyte.com

almost 4 years ago

In reply to: Justin Pryzby (#29)

Re: CLUSTER on partitioned index

On Mon, Apr 11, 2022 at 7:06 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:

Small things here.

1. in VACUUM FULL we only process partitions that are owned by the
invoking user. We don't have this test in the new code. I'm not sure
why do we do that there; is it worth doing the same here?

That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in
CLUSTER
itself). The reason was to avoid blocking if an unprivileged user runs
VACUUM
FULL which would try to lock things (including shared catalogs) before
checking
if they have permission to vacuum them. That commit also initially checks
the
owner of the partitioned table, and then re-checking owner of partitions
later
on.

A similar issue exists here. But 1) catalog tables are not partitioned,
and,
2) ownership of a partitioned table is checked immediately. So the
problem can
only occur if a user who owns a partitioned table but doesn't own all its
partitions tries to cluster it, and it blocks behind another session.
Fixing
this is probably a good idea, but seems improbable that it would avoid a
DOS.

2. We should silently skip a partition that's a foreign table, I
suppose.

I think it's not needed, since the loop over index children doesn't see a
child
index on the foreign table. ?

3. We do mark the index on the partitions as indisclustered AFAICS (we
claim that the partitioned table's index is not marked, which is
accurate). So users doing unadorned CLUSTER afterwards will get the
partitions clustered too, once they cluster the partitioned table. If
they don't want this, they would have to ALTER TABLE to remove the
marking. How likely is that this will be a problem? Maybe documenting
this point is enough.

It seems at least as likely that someone would *want* the partitions to be
marked clustered as that someone would want them to be unchanged.

The cluster mark accurately reflects having been clustered. It seems
unlikely
that a user would want something else to be clustered later by "cluster;".
Since clustering on a partitioned table wasn't supported before, nothing
weird
will happen to someone who upgrades to v15 unless they elect to use the new
feature. As this seems to be POLA, it doesn't even need to be
documented. ?

Hi,
For v13-0002-cluster-early-ownership-check-of-partitions.patch :

only for it to fails ownership check anyway

to fails -> to fail

Cheers

#31

Michael Paquier

michael@paquier.xyz

almost 4 years ago

In reply to: Justin Pryzby (#29)

Re: CLUSTER on partitioned index

On Mon, Apr 11, 2022 at 09:06:09AM -0500, Justin Pryzby wrote:

On Sat, Apr 02, 2022 at 07:21:11PM +0200, Alvaro Herrera wrote:

1. in VACUUM FULL we only process partitions that are owned by the
invoking user. We don't have this test in the new code. I'm not sure
why do we do that there; is it worth doing the same here?

I think that adding a test is a good idea for such things. Perhaps we
could have an isolation test, but what Justin is proposing seems good
enough to me for this goal.

That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
itself). The reason was to avoid blocking if an unprivileged user runs VACUUM
FULL which would try to lock things (including shared catalogs) before checking
if they have permission to vacuum them. That commit also initially checks the
owner of the partitioned table, and then re-checking owner of partitions later
on.

A similar issue exists here. But 1) catalog tables are not partitioned, and,
2) ownership of a partitioned table is checked immediately. So the problem can
only occur if a user who owns a partitioned table but doesn't own all its
partitions tries to cluster it, and it blocks behind another session. Fixing
this is probably a good idea, but seems improbable that it would avoid a DOS.

Catalogs are out of the picture as you say and I would not worry about
them becoming somewhat partitioned even in the far future. Are you
saying that it is possible for a user kicking a CLUSTER command on a
partitioned table who has no ownership on some of the partitions to
do some blocking table_open() calls if the permission check is not
done in get_tables_to_cluster_partitioned()? Hence, this user could
block the access to such partitions? I am not sure that we need to
add any new ownership checks here as CLUOPT_RECHECK gets added to the
parameters in cluster() before calling cluster_multiple_rels(), then
we do a mix of try_relation_open() with a skip when we are not the
owner anymore. So this logic looks sound to me. In short, you don't
need this extra check, and the test proposed in 0002 keeps the same
behavior.

2. We should silently skip a partition that's a foreign table, I
suppose.

I think it's not needed, since the loop over index children doesn't see a child
index on the foreign table?

Hmm. That may be a sign to add an assertion, at least, or something
based on RELKIND_HAS_STORAGE().

I was wondering what 0001 was doing here as that's a separate issue,
but it looked fine so I have applied it.

+       /* Use a permanent memory context for the result list */
+       old_context = MemoryContextSwitchTo(cluster_context);
+
        rtc = (RelToCluster *) palloc(sizeof(RelToCluster));

Independently of the extra ownership check, the memory context
manipulation has to be fixed and the code shoudl switch to
RelToCluster only when saving an item.

+CREATE ROLE ptnowner;
Roles that are created in the regression tests need to be prefixed
with "regress_", or some buildfarm members will complain. FWIW, I
enforce -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS in all my dev
builds.

I have added an open item for now, but the whole looks
straight-forward to me.
--
Michael

#32

Justin Pryzby

pryzby@telsasoft.com

almost 4 years ago

In reply to: Michael Paquier (#31)

Re: CLUSTER on partitioned index

On Wed, Apr 13, 2022 at 03:50:15PM +0900, Michael Paquier wrote:

That dates to a556549d7 (see also cbe24a6dd8 for an earlier commit in CLUSTER
itself). The reason was to avoid blocking if an unprivileged user runs VACUUM
FULL which would try to lock things (including shared catalogs) before checking
if they have permission to vacuum them. That commit also initially checks the
owner of the partitioned table, and then re-checking owner of partitions later
on.

A similar issue exists here. But 1) catalog tables are not partitioned, and,
2) ownership of a partitioned table is checked immediately. So the problem can
only occur if a user who owns a partitioned table but doesn't own all its
partitions tries to cluster it, and it blocks behind another session. Fixing
this is probably a good idea, but seems improbable that it would avoid a DOS.

Catalogs are out of the picture as you say and I would not worry about
them becoming somewhat partitioned even in the far future. Are you
saying that it is possible for a user kicking a CLUSTER command on a
partitioned table who has no ownership on some of the partitions to
do some blocking table_open() calls if the permission check is not
done in get_tables_to_cluster_partitioned()? Hence, this user could
block the access to such partitions? I am not sure that we need to
add any new ownership checks here as CLUOPT_RECHECK gets added to the
parameters in cluster() before calling cluster_multiple_rels(), then
we do a mix of try_relation_open() with a skip when we are not the
owner anymore. So this logic looks sound to me. In short, you don't
need this extra check, and the test proposed in 0002 keeps the same
behavior.

Are you sure? The ownership re-check in cluster_rel() occurs after acquiring
locks.

s1:
postgres=# CREATE TABLE p(i int) PARTITION BY LIST (i);
postgres=# CREATE TABLE p1 PARTITION OF p FOR VALUES IN (1);
postgres=# CREATE TABLE p2 PARTITION OF p FOR VALUES IN (2);
postgres=# CREATE INDEX ON p (i);
postgres=# CREATE ROLE po WITH LOGIN;
postgres=# ALTER TABLE p OWNER TO po;
postgres=# begin; SELECT FROM p1;

s2:
postgres=> SET client_min_messages =debug;
postgres=> CLUSTER VERBOSE p USING p_i_idx ;
LOG: process 26058 still waiting for AccessExclusiveLock on relation 39577 of database 5 after 1000.105 ms
postgres=> SELECT 39577::regclass;
regclass | p1

#33

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Justin Pryzby (#32)

Re: CLUSTER on partitioned index

On Wed, Apr 13, 2022 at 05:52:14AM -0500, Justin Pryzby wrote:

Are you sure? The ownership re-check in cluster_rel() occurs after acquiring
locks.

Yep, you are right. However, the SQL test does not check for this
blocking scenario. In short, removing the new ACL check in
get_tables_to_cluster_partitioned() makes the test behave the same
way. Could you implement an isolation check to make sure that the
difference is visible? The SQL check looks useful in itself, either
way.
--
Michael

#34

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Justin Pryzby (#29)

Re: CLUSTER on partitioned index

Thanks for the patch -- I have pushed it now, with some wording changes
and renaming the role to regress_* to avoid buildfarm's ire.

Michaël in addition proposes an isolation test. I'm not sure; is it
worth the additional test run time? It doesn't seem a critical issue.
But if anybody feels like contributing one, step right ahead.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/

#35

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Alvaro Herrera (#34)

1 attachment(s)

Re: CLUSTER on partitioned index

On Thu, Apr 14, 2022 at 10:37:06PM +0200, Alvaro Herrera wrote:

Thanks for the patch -- I have pushed it now, with some wording changes
and renaming the role to regress_* to avoid buildfarm's ire.

Cool, thanks.

Michaël in addition proposes an isolation test. I'm not sure; is it
worth the additional test run time? It doesn't seem a critical issue.
But if anybody feels like contributing one, step right ahead.

Well, I am a bit annoyed that we don't actually check that a CLUSTER
command does not block when doing a CLUSTER on a partitioned table
while a lock is held on one of its partitions. So, attached is a
proposal of patch to improve the test coverage in this area. While on
it, I have added a test with a normal table. You can see the
difference once you remove the ACL check added recently in
get_tables_to_cluster_partitioned(). What do you think?
--
Michael

Attachments:

0001-Add-isolation-tests-for-CLUSTER-with-partitions.patchtext/x-diff; charset=us-asciiDownload

From fdf5347bf4853f19341a8d67a655cae9fece15f0 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Sat, 16 Apr 2022 20:52:03 +0900
Subject: [PATCH] Add isolation tests for CLUSTER with partitions

---
 .../expected/cluster-conflict-partition.out   | 35 ++++++++++++++++++
 .../isolation/expected/cluster-conflict.out   | 19 ++++++++++
 src/test/isolation/isolation_schedule         |  2 +
 .../specs/cluster-conflict-partition.spec     | 37 +++++++++++++++++++
 .../isolation/specs/cluster-conflict.spec     | 30 +++++++++++++++
 5 files changed, 123 insertions(+)
 create mode 100644 src/test/isolation/expected/cluster-conflict-partition.out
 create mode 100644 src/test/isolation/expected/cluster-conflict.out
 create mode 100644 src/test/isolation/specs/cluster-conflict-partition.spec
 create mode 100644 src/test/isolation/specs/cluster-conflict.spec

diff --git a/src/test/isolation/expected/cluster-conflict-partition.out b/src/test/isolation/expected/cluster-conflict-partition.out
new file mode 100644
index 0000000000..7acb675c97
--- /dev/null
+++ b/src/test/isolation/expected/cluster-conflict-partition.out
@@ -0,0 +1,35 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_lock_parent s2_auth s2_cluster s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock_parent: LOCK cluster_part_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_cluster_part;
+step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind; <waiting ...>
+step s1_commit: COMMIT;
+step s2_cluster: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock_parent s2_cluster s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_cluster_part;
+step s1_lock_parent: LOCK cluster_part_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind; <waiting ...>
+step s1_commit: COMMIT;
+step s2_cluster: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s1_lock_child s2_auth s2_cluster s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock_child: LOCK cluster_part_tab1 IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_cluster_part;
+step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock_child s2_cluster s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_cluster_part;
+step s1_lock_child: LOCK cluster_part_tab1 IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_cluster: CLUSTER cluster_part_tab USING cluster_part_ind;
+step s1_commit: COMMIT;
+step s2_reset: RESET ROLE;
diff --git a/src/test/isolation/expected/cluster-conflict.out b/src/test/isolation/expected/cluster-conflict.out
new file mode 100644
index 0000000000..614d8f9d15
--- /dev/null
+++ b/src/test/isolation/expected/cluster-conflict.out
@@ -0,0 +1,19 @@
+Parsed test spec with 2 sessions
+
+starting permutation: s1_begin s1_lock s2_auth s2_cluster s1_commit s2_reset
+step s1_begin: BEGIN;
+step s1_lock: LOCK cluster_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_auth: SET ROLE regress_cluster_conflict;
+step s2_cluster: CLUSTER cluster_tab USING cluster_ind; <waiting ...>
+step s1_commit: COMMIT;
+step s2_cluster: <... completed>
+step s2_reset: RESET ROLE;
+
+starting permutation: s1_begin s2_auth s1_lock s2_cluster s1_commit s2_reset
+step s1_begin: BEGIN;
+step s2_auth: SET ROLE regress_cluster_conflict;
+step s1_lock: LOCK cluster_tab IN SHARE UPDATE EXCLUSIVE MODE;
+step s2_cluster: CLUSTER cluster_tab USING cluster_ind; <waiting ...>
+step s1_commit: COMMIT;
+step s2_cluster: <... completed>
+step s2_reset: RESET ROLE;
diff --git a/src/test/isolation/isolation_schedule b/src/test/isolation/isolation_schedule
index 607760386e..529a4cbd4d 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -102,6 +102,8 @@ test: partition-key-update-2
 test: partition-key-update-3
 test: partition-key-update-4
 test: plpgsql-toast
+test: cluster-conflict
+test: cluster-conflict-partition
 test: truncate-conflict
 test: serializable-parallel
 test: serializable-parallel-2
diff --git a/src/test/isolation/specs/cluster-conflict-partition.spec b/src/test/isolation/specs/cluster-conflict-partition.spec
new file mode 100644
index 0000000000..5091f684a9
--- /dev/null
+++ b/src/test/isolation/specs/cluster-conflict-partition.spec
@@ -0,0 +1,37 @@
+# Tests for locking conflicts with CLUSTER command and partitions.
+
+setup
+{
+	CREATE ROLE regress_cluster_part;
+	CREATE TABLE cluster_part_tab (a int) PARTITION BY LIST (a);
+	CREATE TABLE cluster_part_tab1 PARTITION OF cluster_part_tab FOR VALUES IN (1);
+	CREATE TABLE cluster_part_tab2 PARTITION OF cluster_part_tab FOR VALUES IN (2);
+	CREATE INDEX cluster_part_ind ON cluster_part_tab(a);
+	ALTER TABLE cluster_part_tab OWNER TO regress_cluster_part;
+}
+
+teardown
+{
+	DROP TABLE cluster_part_tab;
+	DROP ROLE regress_cluster_part;
+}
+
+session s1
+step s1_begin          { BEGIN; }
+step s1_lock_parent    { LOCK cluster_part_tab IN SHARE UPDATE EXCLUSIVE MODE; }
+step s1_lock_child     { LOCK cluster_part_tab1 IN SHARE UPDATE EXCLUSIVE MODE; }
+step s1_commit         { COMMIT; }
+
+session s2
+step s2_auth           { SET ROLE regress_cluster_part; }
+step s2_cluster        { CLUSTER cluster_part_tab USING cluster_part_ind; }
+step s2_reset          { RESET ROLE; }
+
+# CLUSTER on the parent waits if locked, passes for all cases.
+permutation s1_begin s1_lock_parent s2_auth s2_cluster s1_commit s2_reset
+permutation s1_begin s2_auth s1_lock_parent s2_cluster s1_commit s2_reset
+
+# When taking a lock on a partition leaf, CLUSTER on the parent skips
+# the leaf, passes for all cases.
+permutation s1_begin s1_lock_child s2_auth s2_cluster s1_commit s2_reset
+permutation s1_begin s2_auth s1_lock_child s2_cluster s1_commit s2_reset
diff --git a/src/test/isolation/specs/cluster-conflict.spec b/src/test/isolation/specs/cluster-conflict.spec
new file mode 100644
index 0000000000..2e1d547f01
--- /dev/null
+++ b/src/test/isolation/specs/cluster-conflict.spec
@@ -0,0 +1,30 @@
+# Tests for locking conflicts with CLUSTER command.
+
+setup
+{
+	CREATE ROLE regress_cluster_conflict;
+	CREATE TABLE cluster_tab (a int);
+	CREATE INDEX cluster_ind ON cluster_tab(a);
+	ALTER TABLE cluster_tab OWNER TO regress_cluster_conflict;
+}
+
+teardown
+{
+	DROP TABLE cluster_tab;
+	DROP ROLE regress_cluster_conflict;
+}
+
+session s1
+step s1_begin          { BEGIN; }
+step s1_lock           { LOCK cluster_tab IN SHARE UPDATE EXCLUSIVE MODE; }
+step s1_commit         { COMMIT; }
+
+session s2
+step s2_auth           { SET ROLE regress_cluster_conflict; }
+step s2_cluster        { CLUSTER cluster_tab USING cluster_ind; }
+step s2_reset          { RESET ROLE; }
+
+# The role has privileges to cluster the table, CLUSTER will block if
+# another session holds a lock on the table and succeed in all cases.
+permutation s1_begin s1_lock s2_auth s2_cluster s1_commit s2_reset
+permutation s1_begin s2_auth s1_lock s2_cluster s1_commit s2_reset
-- 
2.35.2

#36

Michael Paquier

michael@paquier.xyz

over 3 years ago

In reply to: Michael Paquier (#35)

Re: CLUSTER on partitioned index

On Sat, Apr 16, 2022 at 08:58:50PM +0900, Michael Paquier wrote:

Well, I am a bit annoyed that we don't actually check that a CLUSTER
command does not block when doing a CLUSTER on a partitioned table
while a lock is held on one of its partitions. So, attached is a
proposal of patch to improve the test coverage in this area. While on
it, I have added a test with a normal table. You can see the
difference once you remove the ACL check added recently in
get_tables_to_cluster_partitioned(). What do you think?

This was the last reason why this was listed as an open item, so,
hearing nothing, I have applied this patch to add those extra tests,
and switched the item as fixed.
--
Michael

#37

Alvaro Herrera

alvherre@alvh.no-ip.org

over 3 years ago

In reply to: Michael Paquier (#36)

Re: CLUSTER on partitioned index

On 2022-Apr-26, Michael Paquier wrote:

On Sat, Apr 16, 2022 at 08:58:50PM +0900, Michael Paquier wrote:

Well, I am a bit annoyed that we don't actually check that a CLUSTER
command does not block when doing a CLUSTER on a partitioned table
while a lock is held on one of its partitions. So, attached is a
proposal of patch to improve the test coverage in this area.

This was the last reason why this was listed as an open item, so,
hearing nothing, I have applied this patch to add those extra tests,
and switched the item as fixed.

Thank you!

--
Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/