Add Index-level REINDEX with multiple jobs

Started by Maxim Orlovabout 2 years ago10 messages

orlovmg@gmail.com

about 2 years ago

1 attachment(s)

Hi!

Recently, one of our customers came to us with the question: why do reindex
utility does not support multiple jobs for indices (-i opt)?
And, of course, it is because we cannot control the concurrent processing
of multiple indexes on the same relation. This was
discussed somewhere in [0]/messages/by-id/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg@mail.gmail.com, I believe. So, customer have to make a shell
script to do his business and so on.

But. This seems to be not that complicated to split indices by parent
tables and do reindex in multiple jobs? Or I miss something?
PFA patch implementing this.

As always, any opinions are very welcome!

[0]: /messages/by-id/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg@mail.gmail.com
/messages/by-id/CAOBaU_YrnH_Jqo46NhaJ7uRBiWWEcS40VNRQxgFbqYo9kApUsg@mail.gmail.com

--
Best regards,
Maxim Orlov.

Attachments:

v1-0001-Add-Index-level-REINDEX-with-multiple-jobs.patchapplication/octet-stream; name=v1-0001-Add-Index-level-REINDEX-with-multiple-jobs.patchDownload

From 562c1bceaa82ef34691bce71b8e349825a7167c8 Mon Sep 17 00:00:00 2001
From: Svetlana Derevyanko <s.derevyanko@postgrespro.ru>
Date: Fri, 29 Dec 2023 08:20:54 +0300
Subject: [PATCH v1] Add Index-level REINDEX with multiple jobs

Author: Maxim Orlov <orlovmg@gmail.com>, Svetlana Derevyanko <s.derevyanko@postgrespro.ru>
---
 doc/src/sgml/ref/reindexdb.sgml    |   3 +-
 src/bin/scripts/reindexdb.c        | 158 ++++++++++++++++++++++++++---
 src/bin/scripts/t/090_reindexdb.pl |   8 +-
 3 files changed, 150 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index 8d9ced212f..dfb5e534fb 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -187,8 +187,7 @@ PostgreSQL documentation
         setting is high enough to accommodate all connections.
        </para>
        <para>
-        Note that this option is incompatible with the <option>--index</option>
-        and <option>--system</option> options.
+        Note that this option is incompatible with the <option>--system</option> option.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index ab7f190850..699a332bac 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -251,14 +251,6 @@ main(int argc, char *argv[])
 	}
 	else
 	{
-		/*
-		 * Index-level REINDEX is not supported with multiple jobs as we
-		 * cannot control the concurrent processing of multiple indexes
-		 * depending on the same relation.
-		 */
-		if (concurrentCons > 1 && indexes.head != NULL)
-			pg_fatal("cannot use multiple jobs to reindex indexes");
-
 		if (dbname == NULL)
 		{
 			if (getenv("PGDATABASE"))
@@ -279,7 +271,7 @@ main(int argc, char *argv[])
 		if (indexes.head != NULL)
 			reindex_one_database(&cparams, REINDEX_INDEX, &indexes,
 								 progname, echo, verbose,
-								 concurrently, 1, tablespace);
+								 concurrently, concurrentCons, tablespace);
 
 		if (tables.head != NULL)
 			reindex_one_database(&cparams, REINDEX_TABLE, &tables,
@@ -299,6 +291,67 @@ main(int argc, char *argv[])
 	exit(0);
 }
 
+static SimpleStringList *
+sort_indices_by_tables(PGconn *conn, SimpleStringList *user_list, SimpleStringList *indices_tables_list,
+					   SimpleStringList ***indices_process_list, bool echo)
+{
+	SimpleStringList *process_list = pg_malloc0(sizeof(SimpleStringList));
+	SimpleStringListCell *idx_cell,
+			   *idx_table,
+			   *cell;
+	int			ipl_len = 10,
+				current_tables_num = 0,
+				i;
+
+	*indices_process_list = pg_malloc0(sizeof(*indices_process_list) * ipl_len);
+
+	for (idx_cell = user_list->head, idx_table = indices_tables_list->head;
+		 idx_cell;
+		 idx_cell = idx_cell->next, idx_table = idx_table->next)
+	{
+		SimpleStringList *list_to_add = NULL;
+
+		if (!current_tables_num)
+		{
+			simple_string_list_append(process_list, idx_table->val);
+			(*indices_process_list)[0] = pg_malloc0(sizeof(SimpleStringList));
+			current_tables_num++;
+		}
+
+		cell = process_list->head;
+		for (i = 0; i < current_tables_num; i++)
+		{
+			if (!strcmp(idx_table->val, cell->val))
+			{
+				list_to_add = (*indices_process_list)[i];
+				break;
+			}
+
+		}
+
+		if (!list_to_add)
+		{
+			simple_string_list_append(process_list, idx_table->val);
+
+			if (current_tables_num >= ipl_len)
+			{
+				ipl_len *= 2;
+				*indices_process_list = pg_realloc(*indices_process_list,
+												   sizeof(*indices_process_list) * ipl_len);
+			}
+
+			(*indices_process_list)[current_tables_num] = pg_malloc0(sizeof(SimpleStringList));
+			list_to_add = (*indices_process_list)[current_tables_num];
+			current_tables_num++;
+		}
+
+		simple_string_list_append(list_to_add, idx_cell->val);
+
+	}
+
+	return process_list;
+}
+
 static void
 reindex_one_database(ConnParams *cparams, ReindexType type,
 					 SimpleStringList *user_list,
@@ -310,10 +363,13 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 	SimpleStringListCell *cell;
 	bool		parallel = concurrentCons > 1;
 	SimpleStringList *process_list = user_list;
+	SimpleStringList **indices_process_list = NULL;
+	SimpleStringList *indices_tables_list = NULL;
 	ReindexType process_type = type;
 	ParallelSlotArray *sa;
 	bool		failed = false;
-	int			items_count = 0;
+	int			items_count = 0,
+				i = 0;
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
@@ -383,8 +439,23 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 					return;
 				break;
 
-			case REINDEX_SYSTEM:
 			case REINDEX_INDEX:
+				Assert(user_list != NULL);
+
+				/* Build a list of relations from the indices */
+				indices_tables_list = get_parallel_object_list(conn, process_type,
+															   user_list, echo);
+				/* Distribute indices by tables */
+				process_list = sort_indices_by_tables(conn, user_list, indices_tables_list,
+													  &indices_process_list, echo);
+				simple_string_list_destroy(indices_tables_list);
+
+				/* Bail out if nothing to process */
+				if (process_list == NULL)
+					return;
+				break;
+
+			case REINDEX_SYSTEM:
 				/* not supported */
 				Assert(false);
 				break;
@@ -440,8 +511,19 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_reindex_command(free_slot->connection, process_type, objname,
-							echo, verbose, concurrently, true, tablespace);
+		if (parallel && process_type == REINDEX_INDEX)
+		{
+			SimpleStringListCell *idx_cell;
+
+			for (idx_cell = indices_process_list[i]->head; idx_cell; idx_cell = idx_cell->next)
+				run_reindex_command(free_slot->connection, process_type, idx_cell->val,
+									echo, verbose, concurrently, true, tablespace);
+			simple_string_list_destroy(indices_process_list[i]);
+			i++;
+		}
+		else
+			run_reindex_command(free_slot->connection, process_type, objname,
+								echo, verbose, concurrently, true, tablespace);
 
 		cell = cell->next;
 	} while (cell != NULL);
@@ -456,6 +538,9 @@ finish:
 		pg_free(process_list);
 	}
 
+	if (indices_process_list)
+		pg_free(indices_process_list);
+
 	ParallelSlotsTerminate(sa);
 	pfree(sa);
 
@@ -667,8 +752,53 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 			}
 			break;
 
-		case REINDEX_SYSTEM:
 		case REINDEX_INDEX:
+			{
+				SimpleStringListCell *cell;
+				bool		idx_listed = false;
+
+				Assert(user_list != NULL);
+
+				/*
+				 * Straight forward index-level REINDEX is not supported with
+				 * multiple jobs as we cannot control the concurrent
+				 * processing of multiple indexes depending on the same
+				 * relation.  But we can extract appropriate table name for
+				 * the index and put REINDEX INDEX commands into different
+				 * jobs, accordingly to the parent tables.
+				 */
+				appendPQExpBufferStr(&catalog_query,
+									 "WITH idx as (\n"
+									 "  SELECT c.relname, ns.nspname\n"
+									 "  FROM pg_catalog.pg_class c,\n"
+									 "       pg_catalog.pg_namespace ns\n"
+									 "  WHERE c.relnamespace OPERATOR(pg_catalog.=) ns.oid AND\n"
+									 "        c.oid OPERATOR(pg_catalog.=) ANY(ARRAY['\n");
+
+				for (cell = user_list->head; cell; cell = cell->next)
+				{
+					if (idx_listed)
+						appendPQExpBufferStr(&catalog_query, "', '");
+					else
+						idx_listed = true;
+
+					appendQualifiedRelation(&catalog_query, cell->val, conn, echo);
+				}
+
+				appendPQExpBufferStr(&catalog_query,
+									 "'\n"
+									 "        ]::pg_catalog.regclass[])\n"
+									 ") \n"
+									 "SELECT pi.tablename, pi.schemaname \n"
+									 "FROM pg_catalog.pg_indexes as pi, idx\n"
+									 "WHERE pi.schemaname OPERATOR(pg_catalog.=) idx.nspname AND \n"
+									 "      pi.indexname OPERATOR(pg_catalog.=) idx.relname;\n");
+
+				executeCommand(conn, "RESET search_path;", false);
+			}
+			break;
+
+		case REINDEX_SYSTEM:
 		case REINDEX_TABLE:
 			Assert(false);
 			break;
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index b663d0e741..339cb98870 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -236,9 +236,11 @@ $node->safe_psql(
 	CREATE SCHEMA s1;
 	CREATE TABLE s1.t1(id integer);
 	CREATE INDEX ON s1.t1(id);
+	CREATE INDEX i1 ON s1.t1(id);
 	CREATE SCHEMA s2;
 	CREATE TABLE s2.t2(id integer);
 	CREATE INDEX ON s2.t2(id);
+	CREATE INDEX i2 ON s2.t2(id);
 	-- empty schema
 	CREATE SCHEMA s3;
 |);
@@ -246,9 +248,9 @@ $node->safe_psql(
 $node->command_fails(
 	[ 'reindexdb', '-j', '2', '-s', 'postgres' ],
 	'parallel reindexdb cannot process system catalogs');
-$node->command_fails(
-	[ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ],
-	'parallel reindexdb cannot process indexes');
+$node->command_ok(
+	[ 'reindexdb', '-j', '2', '-i', 's1.i1', '-i', 's2.i2', 'postgres' ],
+	'parallel reindexdb for indices');
 # Note that the ordering of the commands is not stable, so the second
 # command for s2.t2 is not checked after.
 $node->issues_sql_like(
-- 
2.42.0

Michael Paquier

michael@paquier.xyz

almost 2 years ago

In reply to: Maxim Orlov (#1)

Re: Add Index-level REINDEX with multiple jobs

On Fri, Dec 29, 2023 at 04:15:35PM +0300, Maxim Orlov wrote:

Recently, one of our customers came to us with the question: why do reindex
utility does not support multiple jobs for indices (-i opt)?
And, of course, it is because we cannot control the concurrent processing
of multiple indexes on the same relation. This was
discussed somewhere in [0], I believe. So, customer have to make a shell
script to do his business and so on.

Yep, that should be the correct thread. As far as I recall, one major
reason was code simplicity because dealing with parallel jobs at table
level is a no-brainer on the client side (see 0003): we know that
relations with physical storage will never interact with each other.

But. This seems to be not that complicated to split indices by parent
tables and do reindex in multiple jobs? Or I miss something?
PFA patch implementing this.

+   appendPQExpBufferStr(&catalog_query,
+                       "WITH idx as (\n"
+                       "  SELECT c.relname, ns.nspname\n"
+                       "  FROM pg_catalog.pg_class c,\n"
+                       "       pg_catalog.pg_namespace ns\n"
+                       "  WHERE c.relnamespace OPERATOR(pg_catalog.=) ns.oid AND\n"
+                       "        c.oid OPERATOR(pg_catalog.=) ANY(ARRAY['\n");

The problem may be actually trickier than that, no? Could there be
other factors to take into account for their classification, like
their sizes (typically, we'd want to process the biggest one first, I
guess)?
--
Michael

Andrey M. Borodin

x4mmm@yandex-team.ru

almost 2 years ago

In reply to: Michael Paquier (#2)

Re: Add Index-level REINDEX with multiple jobs

On 6 Feb 2024, at 11:21, Michael Paquier <michael@paquier.xyz> wrote:

The problem may be actually trickier than that, no? Could there be
other factors to take into account for their classification, like
their sizes (typically, we'd want to process the biggest one first, I
guess)?

Maxim, what do you think about it?

Best regards, Andrey Borodin.

Svetlana Derevyanko

s.derevyanko@postgrespro.ru

almost 2 years ago

In reply to: Andrey M. Borodin (#3)

1 attachment(s)

Re: Add Index-level REINDEX with multiple jobs

Andrey M. Borodin писал(а) 2024-03-04 09:26:

On 6 Feb 2024, at 11:21, Michael Paquier <michael@paquier.xyz> wrote:

The problem may be actually trickier than that, no? Could there be
other factors to take into account for their classification, like
their sizes (typically, we'd want to process the biggest one first, I
guess)?

Maxim, what do you think about it?

Best regards, Andrey Borodin.

I agree that, as is the case with tables in REINDEX SCHEME, indices
should be sorted accordingly to their size.

Attaching the second version of patch, with indices being sorted by
size.

Best regards,
Svetlana Derevyanko

Attachments:

v2-0001-Add-Index-level-REINDEX-with-multiple-jobs.patchtext/x-diff; name=v2-0001-Add-Index-level-REINDEX-with-multiple-jobs.patchDownload

From 3c2f0fcbcf382cc2cb9786e26ad8027558937ea2 Mon Sep 17 00:00:00 2001
From: Svetlana Derevyanko <s.derevyanko@postgrespro.ru>
Date: Fri, 29 Dec 2023 08:20:54 +0300
Subject: [PATCH v2] Add Index-level REINDEX with multiple jobs

Author: Maxim Orlov <orlovmg@gmail.com>, Svetlana Derevyanko <s.derevyanko@postgrespro.ru>
---
 doc/src/sgml/ref/reindexdb.sgml    |   3 +-
 src/bin/scripts/reindexdb.c        | 159 ++++++++++++++++++++++++++---
 src/bin/scripts/t/090_reindexdb.pl |   8 +-
 3 files changed, 151 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index 8d9ced212f..dfb5e534fb 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -187,8 +187,7 @@ PostgreSQL documentation
         setting is high enough to accommodate all connections.
        </para>
        <para>
-        Note that this option is incompatible with the <option>--index</option>
-        and <option>--system</option> options.
+        Note that this option is incompatible with the <option>--system</option> option.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 6ae30dff31..63d99e7441 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -251,14 +251,6 @@ main(int argc, char *argv[])
 	}
 	else
 	{
-		/*
-		 * Index-level REINDEX is not supported with multiple jobs as we
-		 * cannot control the concurrent processing of multiple indexes
-		 * depending on the same relation.
-		 */
-		if (concurrentCons > 1 && indexes.head != NULL)
-			pg_fatal("cannot use multiple jobs to reindex indexes");
-
 		if (dbname == NULL)
 		{
 			if (getenv("PGDATABASE"))
@@ -279,7 +271,7 @@ main(int argc, char *argv[])
 		if (indexes.head != NULL)
 			reindex_one_database(&cparams, REINDEX_INDEX, &indexes,
 								 progname, echo, verbose,
-								 concurrently, 1, tablespace);
+								 concurrently, concurrentCons, tablespace);
 
 		if (tables.head != NULL)
 			reindex_one_database(&cparams, REINDEX_TABLE, &tables,
@@ -299,6 +291,67 @@ main(int argc, char *argv[])
 	exit(0);
 }
 
+static SimpleStringList *
+sort_indices_by_tables(PGconn *conn, SimpleStringList *user_list, SimpleStringList *indices_tables_list,
+					   SimpleStringList ***indices_process_list, bool echo)
+{
+	SimpleStringList *process_list = pg_malloc0(sizeof(SimpleStringList));
+	SimpleStringListCell *idx_cell,
+			   *idx_table,
+			   *cell;
+	int			ipl_len = 10,
+				current_tables_num = 0,
+				i;
+
+	*indices_process_list = pg_malloc0(sizeof(*indices_process_list) * ipl_len);
+
+	for (idx_cell = user_list->head, idx_table = indices_tables_list->head;
+		 idx_cell;
+		 idx_cell = idx_cell->next, idx_table = idx_table->next)
+	{
+		SimpleStringList *list_to_add = NULL;
+
+		if (!current_tables_num)
+		{
+			simple_string_list_append(process_list, idx_table->val);
+			(*indices_process_list)[0] = pg_malloc0(sizeof(SimpleStringList));
+			current_tables_num++;
+		}
+
+		cell = process_list->head;
+		for (i = 0; i < current_tables_num; i++)
+		{
+			if (!strcmp(idx_table->val, cell->val))
+			{
+				list_to_add = (*indices_process_list)[i];
+				break;
+			}
+
+		}
+
+		if (!list_to_add)
+		{
+			simple_string_list_append(process_list, idx_table->val);
+
+			if (current_tables_num >= ipl_len)
+			{
+				ipl_len *= 2;
+				*indices_process_list = pg_realloc(*indices_process_list,
+												   sizeof(*indices_process_list) * ipl_len);
+			}
+
+			(*indices_process_list)[current_tables_num] = pg_malloc0(sizeof(SimpleStringList));
+			list_to_add = (*indices_process_list)[current_tables_num];
+			current_tables_num++;
+		}
+
+		simple_string_list_append(list_to_add, idx_cell->val);
+
+	}
+
+	return process_list;
+}
+
 static void
 reindex_one_database(ConnParams *cparams, ReindexType type,
 					 SimpleStringList *user_list,
@@ -310,10 +363,13 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 	SimpleStringListCell *cell;
 	bool		parallel = concurrentCons > 1;
 	SimpleStringList *process_list = user_list;
+	SimpleStringList **indices_process_list = NULL;
+	SimpleStringList *indices_tables_list = NULL;
 	ReindexType process_type = type;
 	ParallelSlotArray *sa;
 	bool		failed = false;
-	int			items_count = 0;
+	int			items_count = 0,
+				i = 0;
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
@@ -383,8 +439,23 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 					return;
 				break;
 
-			case REINDEX_SYSTEM:
 			case REINDEX_INDEX:
+				Assert(user_list != NULL);
+
+				/* Build a list of relations from the indices */
+				indices_tables_list = get_parallel_object_list(conn, process_type,
+															   user_list, echo);
+				/* Distribute indices by tables */
+				process_list = sort_indices_by_tables(conn, user_list, indices_tables_list,
+													  &indices_process_list, echo);
+				simple_string_list_destroy(indices_tables_list);
+
+				/* Bail out if nothing to process */
+				if (process_list == NULL)
+					return;
+				break;
+
+			case REINDEX_SYSTEM:
 				/* not supported */
 				Assert(false);
 				break;
@@ -440,8 +511,19 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 		}
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
-		run_reindex_command(free_slot->connection, process_type, objname,
-							echo, verbose, concurrently, true, tablespace);
+		if (parallel && process_type == REINDEX_INDEX)
+		{
+			SimpleStringListCell *idx_cell;
+
+			for (idx_cell = indices_process_list[i]->head; idx_cell; idx_cell = idx_cell->next)
+				run_reindex_command(free_slot->connection, process_type, idx_cell->val,
+									echo, verbose, concurrently, true, tablespace);
+			simple_string_list_destroy(indices_process_list[i]);
+			i++;
+		}
+		else
+			run_reindex_command(free_slot->connection, process_type, objname,
+								echo, verbose, concurrently, true, tablespace);
 
 		cell = cell->next;
 	} while (cell != NULL);
@@ -456,6 +538,9 @@ finish:
 		pg_free(process_list);
 	}
 
+	if (indices_process_list)
+		pg_free(indices_process_list);
+
 	ParallelSlotsTerminate(sa);
 	pfree(sa);
 
@@ -667,8 +752,54 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 			}
 			break;
 
-		case REINDEX_SYSTEM:
 		case REINDEX_INDEX:
+			{
+				SimpleStringListCell *cell;
+				bool		idx_listed = false;
+
+				Assert(user_list != NULL);
+
+				/*
+				 * Straight forward index-level REINDEX is not supported with
+				 * multiple jobs as we cannot control the concurrent
+				 * processing of multiple indexes depending on the same
+				 * relation.  But we can extract appropriate table name for
+				 * the index and put REINDEX INDEX commands into different
+				 * jobs, accordingly to the parent tables.
+				 */
+				appendPQExpBufferStr(&catalog_query,
+									 "WITH idx as (\n"
+									 "  SELECT c.relname, c.relpages, ns.nspname\n"
+									 "  FROM pg_catalog.pg_class c,\n"
+									 "       pg_catalog.pg_namespace ns\n"
+									 "  WHERE c.relnamespace OPERATOR(pg_catalog.=) ns.oid AND\n"
+									 "        c.oid OPERATOR(pg_catalog.=) ANY(ARRAY['\n");
+
+				for (cell = user_list->head; cell; cell = cell->next)
+				{
+					if (idx_listed)
+						appendPQExpBufferStr(&catalog_query, "', '");
+					else
+						idx_listed = true;
+
+					appendQualifiedRelation(&catalog_query, cell->val, conn, echo);
+				}
+
+				appendPQExpBufferStr(&catalog_query,
+									 "'\n"
+									 "        ]::pg_catalog.regclass[])\n"
+									 ") \n"
+									 "SELECT pi.tablename, pi.schemaname \n"
+									 "FROM pg_catalog.pg_indexes as pi, idx\n"
+									 "WHERE pi.schemaname OPERATOR(pg_catalog.=) idx.nspname AND \n"
+									 "      pi.indexname OPERATOR(pg_catalog.=) idx.relname\n"
+									 "ORDER BY idx.relpages DESC;");
+
+				executeCommand(conn, "RESET search_path;", false);
+			}
+			break;
+
+		case REINDEX_SYSTEM:
 		case REINDEX_TABLE:
 			Assert(false);
 			break;
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index 4f1a141132..cacaac4940 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -236,9 +236,11 @@ $node->safe_psql(
 	CREATE SCHEMA s1;
 	CREATE TABLE s1.t1(id integer);
 	CREATE INDEX ON s1.t1(id);
+	CREATE INDEX i1 ON s1.t1(id);
 	CREATE SCHEMA s2;
 	CREATE TABLE s2.t2(id integer);
 	CREATE INDEX ON s2.t2(id);
+	CREATE INDEX i2 ON s2.t2(id);
 	-- empty schema
 	CREATE SCHEMA s3;
 |);
@@ -246,9 +248,9 @@ $node->safe_psql(
 $node->command_fails(
 	[ 'reindexdb', '-j', '2', '-s', 'postgres' ],
 	'parallel reindexdb cannot process system catalogs');
-$node->command_fails(
-	[ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ],
-	'parallel reindexdb cannot process indexes');
+$node->command_ok(
+	[ 'reindexdb', '-j', '2', '-i', 's1.i1', '-i', 's2.i2', 'postgres' ],
+	'parallel reindexdb for indices');
 # Note that the ordering of the commands is not stable, so the second
 # command for s2.t2 is not checked after.
 $node->issues_sql_like(
-- 
2.34.1

Maxim Orlov

orlovmg@gmail.com

almost 2 years ago

In reply to: Michael Paquier (#2)

Re: Add Index-level REINDEX with multiple jobs

On Tue, 6 Feb 2024 at 09:22, Michael Paquier <michael@paquier.xyz> wrote:

The problem may be actually trickier than that, no? Could there be
other factors to take into account for their classification, like
their sizes (typically, we'd want to process the biggest one first, I
guess)?

Sorry for a late reply. Thanks for an explanation. This is sounds
reasonable to me.
Svetlana had addressed this in the patch v2.

--
Best regards,
Maxim Orlov.

Alexander Korotkov

aekorotkov@gmail.com

almost 2 years ago

In reply to: Maxim Orlov (#5)

1 attachment(s)

Re: Add Index-level REINDEX with multiple jobs

On Mon, Mar 11, 2024 at 3:44 PM Maxim Orlov <orlovmg@gmail.com> wrote:

On Tue, 6 Feb 2024 at 09:22, Michael Paquier <michael@paquier.xyz> wrote:

The problem may be actually trickier than that, no? Could there be
other factors to take into account for their classification, like
their sizes (typically, we'd want to process the biggest one first, I
guess)?

Sorry for a late reply. Thanks for an explanation. This is sounds reasonable to me.
Svetlana had addressed this in the patch v2.

I think this patch is a nice improvement. But it doesn't seem to be
implemented in the right way. There is no guarantee that
get_parallel_object_list() will return tables in the same order as
indexes. Especially when there is "ORDER BY idx.relpages". Also,
sort_indices_by_tables() has quadratic complexity (probably OK since
input list shouldn't be too lengthy) and a bit awkward.

I've revised the patchset. Now appropriate ordering is made in SQL
query. The original list of indexes is modified to match the list of
tables. The tables are ordered by the size of its greatest index,
within table indexes are ordered by size.

I'm going to further revise this patch, mostly comments and the commit message.

------
Regards,
Alexander Korotkov

Attachments:

v3-0001-Add-Index-level-REINDEX-with-multiple-jobs.patchapplication/octet-stream; name=v3-0001-Add-Index-level-REINDEX-with-multiple-jobs.patchDownload

From e03528f833999cb6b806279ecdba7c42acd71ac0 Mon Sep 17 00:00:00 2001
From: Svetlana Derevyanko <s.derevyanko@postgrespro.ru>
Date: Fri, 29 Dec 2023 08:20:54 +0300
Subject: [PATCH v3] Add Index-level REINDEX with multiple jobs

Author: Maxim Orlov <orlovmg@gmail.com>, Svetlana Derevyanko <s.derevyanko@postgrespro.ru>
---
 doc/src/sgml/ref/reindexdb.sgml    |   3 +-
 src/bin/scripts/reindexdb.c        | 120 +++++++++++++++++++++++------
 src/bin/scripts/t/090_reindexdb.pl |   8 +-
 3 files changed, 104 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index a877439dc5b..98c3333228f 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -179,8 +179,7 @@ PostgreSQL documentation
         setting is high enough to accommodate all connections.
        </para>
        <para>
-        Note that this option is incompatible with the <option>--index</option>
-        and <option>--system</option> options.
+        Note that this option is incompatible with the <option>--system</option> option.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 231e5c8fd02..91e5c633ef1 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -206,19 +206,8 @@ main(int argc, char *argv[])
 
 	setup_cancel_handler(NULL);
 
-	if (concurrentCons > 1)
-	{
-		/*
-		 * Index-level REINDEX is not supported with multiple jobs as we
-		 * cannot control the concurrent processing of multiple indexes
-		 * depending on the same relation.
-		 */
-		if (indexes.head != NULL)
-			pg_fatal("cannot use multiple jobs to reindex indexes");
-
-		if (syscatalog)
-			pg_fatal("cannot use multiple jobs to reindex system catalogs");
-	}
+	if (concurrentCons > 1 && syscatalog)
+		pg_fatal("cannot use multiple jobs to reindex system catalogs");
 
 	if (alldb)
 	{
@@ -258,7 +247,7 @@ main(int argc, char *argv[])
 		if (indexes.head != NULL)
 			reindex_one_database(&cparams, REINDEX_INDEX, &indexes,
 								 progname, echo, verbose,
-								 concurrently, 1, tablespace);
+								 concurrently, concurrentCons, tablespace);
 
 		if (tables.head != NULL)
 			reindex_one_database(&cparams, REINDEX_TABLE, &tables,
@@ -288,12 +277,16 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 {
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
+	SimpleStringListCell *indices_tables_cell;
 	bool		parallel = concurrentCons > 1;
 	SimpleStringList *process_list = user_list;
+	SimpleStringList *indices_tables_list = NULL;
 	ReindexType process_type = type;
 	ParallelSlotArray *sa;
 	bool		failed = false;
 	int			items_count = 0;
+	char	   *prev_index_table_name = NULL;
+	ParallelSlot *free_slot = NULL;
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
@@ -363,8 +356,22 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 					return;
 				break;
 
-			case REINDEX_SYSTEM:
 			case REINDEX_INDEX:
+				Assert(user_list != NULL);
+
+				/* Build a list of relations from the indices */
+				indices_tables_list = get_parallel_object_list(conn, process_type,
+															   user_list, echo);
+
+				if (indices_tables_list)
+					indices_tables_cell = indices_tables_list->head;
+
+				/* Bail out if nothing to process */
+				if (process_list == NULL)
+					return;
+				break;
+
+			case REINDEX_SYSTEM:
 				/* not supported */
 				Assert(false);
 				break;
@@ -404,7 +411,7 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 	do
 	{
 		const char *objname = cell->val;
-		ParallelSlot *free_slot = NULL;
+		bool		need_new_slot = true;
 
 		if (CancelRequested)
 		{
@@ -412,14 +419,27 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 			goto finish;
 		}
 
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
+		if (parallel && process_type == REINDEX_INDEX)
 		{
-			failed = true;
-			goto finish;
+			if (prev_index_table_name != NULL &&
+				strcmp(prev_index_table_name, indices_tables_cell->val) == 0)
+				need_new_slot = false;
+			prev_index_table_name = indices_tables_cell->val;
+			indices_tables_cell = indices_tables_cell->next;
+		}
+
+		if (need_new_slot)
+		{
+			free_slot = ParallelSlotsGetIdle(sa, NULL);
+			if (!free_slot)
+			{
+				failed = true;
+				goto finish;
+			}
+
+			ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
 		}
 
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
 		run_reindex_command(free_slot->connection, process_type, objname,
 							echo, verbose, concurrently, true, tablespace);
 
@@ -436,6 +456,12 @@ finish:
 		pg_free(process_list);
 	}
 
+	if (indices_tables_list)
+	{
+		simple_string_list_destroy(indices_tables_list);
+		pg_free(indices_tables_list);
+	}
+
 	ParallelSlotsTerminate(sa);
 	pfree(sa);
 
@@ -647,8 +673,48 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 			}
 			break;
 
-		case REINDEX_SYSTEM:
 		case REINDEX_INDEX:
+			{
+				SimpleStringListCell *cell;
+
+				Assert(user_list != NULL);
+
+				/*
+				 * Straight forward index-level REINDEX is not supported with
+				 * multiple jobs as we cannot control the concurrent
+				 * processing of multiple indexes depending on the same
+				 * relation.  But we can extract appropriate table name for
+				 * the index and put REINDEX INDEX commands into different
+				 * jobs, accordingly to the parent tables.
+				 */
+				appendPQExpBufferStr(&catalog_query,
+									 "SELECT t.relname, n.nspname, i.relname\n"
+									 "FROM pg_catalog.pg_index x\n"
+									 "JOIN pg_catalog.pg_class t ON t.oid = x.indrelid\n"
+									 "JOIN pg_catalog.pg_class i ON i.oid = x.indexrelid\n"
+									 "LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.relnamespace\n"
+									 "WHERE x.indexrelid OPERATOR(pg_catalog.=) ANY(ARRAY['");
+
+				for (cell = user_list->head; cell; cell = cell->next)
+				{
+					if (cell != user_list->head)
+						appendPQExpBufferStr(&catalog_query, "', '");
+
+					appendQualifiedRelation(&catalog_query, cell->val, conn, echo);
+				}
+
+				appendPQExpBufferStr(&catalog_query,
+									 "']::pg_catalog.regclass[])\n"
+									 "ORDER BY max(i.relpages) OVER \n"
+									 "    (PARTITION BY n.nspname, t.relname),\n"
+									 "  n.nspname, t.relname, i.relpages;\n");
+
+				simple_string_list_destroy(user_list);
+				user_list->head = user_list->tail = NULL;
+			}
+			break;
+
+		case REINDEX_SYSTEM:
 		case REINDEX_TABLE:
 			Assert(false);
 			break;
@@ -680,6 +746,16 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 
 		simple_string_list_append(tables, buf.data);
 		resetPQExpBuffer(&buf);
+
+		if (type == REINDEX_INDEX)
+		{
+			appendPQExpBufferStr(&buf,
+								 fmtQualifiedId(PQgetvalue(res, i, 1),
+												PQgetvalue(res, i, 2)));
+
+			simple_string_list_append(user_list, buf.data);
+			resetPQExpBuffer(&buf);
+		}
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index 429dd3acd68..f02822da218 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -236,9 +236,11 @@ $node->safe_psql(
 	CREATE SCHEMA s1;
 	CREATE TABLE s1.t1(id integer);
 	CREATE INDEX ON s1.t1(id);
+	CREATE INDEX i1 ON s1.t1(id);
 	CREATE SCHEMA s2;
 	CREATE TABLE s2.t2(id integer);
 	CREATE INDEX ON s2.t2(id);
+	CREATE INDEX i2 ON s2.t2(id);
 	-- empty schema
 	CREATE SCHEMA s3;
 |);
@@ -246,9 +248,9 @@ $node->safe_psql(
 $node->command_fails(
 	[ 'reindexdb', '-j', '2', '-s', 'postgres' ],
 	'parallel reindexdb cannot process system catalogs');
-$node->command_fails(
-	[ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ],
-	'parallel reindexdb cannot process indexes');
+$node->command_ok(
+	[ 'reindexdb', '-j', '2', '-i', 's1.i1', '-i', 's2.i2', 'postgres' ],
+	'parallel reindexdb for indices');
 # Note that the ordering of the commands is not stable, so the second
 # command for s2.t2 is not checked after.
 $node->issues_sql_like(
-- 
2.39.3 (Apple Git-145)

Alexander Korotkov

aekorotkov@gmail.com

almost 2 years ago

In reply to: Alexander Korotkov (#6)

1 attachment(s)

Re: Add Index-level REINDEX with multiple jobs

On Wed, Mar 20, 2024 at 7:19 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Mon, Mar 11, 2024 at 3:44 PM Maxim Orlov <orlovmg@gmail.com> wrote:

On Tue, 6 Feb 2024 at 09:22, Michael Paquier <michael@paquier.xyz> wrote:

The problem may be actually trickier than that, no? Could there be
other factors to take into account for their classification, like
their sizes (typically, we'd want to process the biggest one first, I
guess)?

Sorry for a late reply. Thanks for an explanation. This is sounds reasonable to me.
Svetlana had addressed this in the patch v2.

I think this patch is a nice improvement. But it doesn't seem to be
implemented in the right way. There is no guarantee that
get_parallel_object_list() will return tables in the same order as
indexes. Especially when there is "ORDER BY idx.relpages". Also,
sort_indices_by_tables() has quadratic complexity (probably OK since
input list shouldn't be too lengthy) and a bit awkward.

I've revised the patchset. Now appropriate ordering is made in SQL
query. The original list of indexes is modified to match the list of
tables. The tables are ordered by the size of its greatest index,
within table indexes are ordered by size.

I'm going to further revise this patch, mostly comments and the commit message.

Here goes the revised patch. I'm going to push this if there are no objections.

------
Regards,
Alexander Korotkov

Attachments:

v4-0001-Add-the-index-level-REINDEX-with-multiple-jobs.patchapplication/octet-stream; name=v4-0001-Add-the-index-level-REINDEX-with-multiple-jobs.patchDownload

From b80af9b0b83815ac0a7e23f8385049f35ad911a3 Mon Sep 17 00:00:00 2001
From: Alexander Korotkov <akorotkov@postgresql.org>
Date: Fri, 22 Mar 2024 17:42:44 +0200
Subject: [PATCH v4] Add the index-level REINDEX with multiple jobs

Straight-forward index-level REINDEX is not supported with multiple jobs as
we cannot control the concurrent processing of multiple indexes depending on
the same relation.  Instead, we dedicate the whole table to certain reindex
jobs. Thus, if indexes in the lists belong to the different tables, that gives
us a fair level of parallelism.

This commit teaches get_parallel_object_list() to fetch table names for
indexes in the case of index-level REINDE.  The same tables are grouped
together in the output order, and the list of indexes is also rebuilt to
match that order. Later during that list processing, we push indexes
belonging to the same table into the same job.

Discussion: https://postgr.es/m/CACG%3DezZU_VwDi-1PN8RUSE6mcYG%2BYx1NH_rJO4%2BKe-mKqLp%3DNw%40mail.gmail.com
Author: Maxim Orlov, Svetlana Derevyanko, Alexander Korotkov
Reviewed-by: Michael Paquier
---
 doc/src/sgml/ref/reindexdb.sgml    |   3 +-
 src/bin/scripts/reindexdb.c        | 146 ++++++++++++++++++++++++-----
 src/bin/scripts/t/090_reindexdb.pl |   8 +-
 3 files changed, 130 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/ref/reindexdb.sgml b/doc/src/sgml/ref/reindexdb.sgml
index a877439dc5b..98c3333228f 100644
--- a/doc/src/sgml/ref/reindexdb.sgml
+++ b/doc/src/sgml/ref/reindexdb.sgml
@@ -179,8 +179,7 @@ PostgreSQL documentation
         setting is high enough to accommodate all connections.
        </para>
        <para>
-        Note that this option is incompatible with the <option>--index</option>
-        and <option>--system</option> options.
+        Note that this option is incompatible with the <option>--system</option> option.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index 231e5c8fd02..1a09c3a487e 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -206,19 +206,8 @@ main(int argc, char *argv[])
 
 	setup_cancel_handler(NULL);
 
-	if (concurrentCons > 1)
-	{
-		/*
-		 * Index-level REINDEX is not supported with multiple jobs as we
-		 * cannot control the concurrent processing of multiple indexes
-		 * depending on the same relation.
-		 */
-		if (indexes.head != NULL)
-			pg_fatal("cannot use multiple jobs to reindex indexes");
-
-		if (syscatalog)
-			pg_fatal("cannot use multiple jobs to reindex system catalogs");
-	}
+	if (concurrentCons > 1 && syscatalog)
+		pg_fatal("cannot use multiple jobs to reindex system catalogs");
 
 	if (alldb)
 	{
@@ -258,7 +247,7 @@ main(int argc, char *argv[])
 		if (indexes.head != NULL)
 			reindex_one_database(&cparams, REINDEX_INDEX, &indexes,
 								 progname, echo, verbose,
-								 concurrently, 1, tablespace);
+								 concurrently, concurrentCons, tablespace);
 
 		if (tables.head != NULL)
 			reindex_one_database(&cparams, REINDEX_TABLE, &tables,
@@ -288,12 +277,16 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 {
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
+	SimpleStringListCell *indices_tables_cell;
 	bool		parallel = concurrentCons > 1;
 	SimpleStringList *process_list = user_list;
+	SimpleStringList *indices_tables_list = NULL;
 	ReindexType process_type = type;
 	ParallelSlotArray *sa;
 	bool		failed = false;
 	int			items_count = 0;
+	char	   *prev_index_table_name = NULL;
+	ParallelSlot *free_slot = NULL;
 
 	conn = connectDatabase(cparams, progname, echo, false, true);
 
@@ -363,8 +356,25 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 					return;
 				break;
 
-			case REINDEX_SYSTEM:
 			case REINDEX_INDEX:
+				Assert(user_list != NULL);
+
+				/*
+				 * Build a list of relations from the indices.  This will
+				 * accordingly reorder the list of indices too.
+				 */
+				indices_tables_list = get_parallel_object_list(conn, process_type,
+															   user_list, echo);
+
+				if (indices_tables_list)
+					indices_tables_cell = indices_tables_list->head;
+
+				/* Bail out if nothing to process */
+				if (process_list == NULL)
+					return;
+				break;
+
+			case REINDEX_SYSTEM:
 				/* not supported */
 				Assert(false);
 				break;
@@ -404,7 +414,7 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 	do
 	{
 		const char *objname = cell->val;
-		ParallelSlot *free_slot = NULL;
+		bool		need_new_slot = true;
 
 		if (CancelRequested)
 		{
@@ -412,14 +422,33 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 			goto finish;
 		}
 
-		free_slot = ParallelSlotsGetIdle(sa, NULL);
-		if (!free_slot)
+		/*
+		 * For parallel index-level REINDEX, the indices of the same table are
+		 * ordered together and they are to be processed by the same job. So,
+		 * we don't switch the job as soon as the index belongs to the same
+		 * table as the previous one.
+		 */
+		if (parallel && process_type == REINDEX_INDEX)
+		{
+			if (prev_index_table_name != NULL &&
+				strcmp(prev_index_table_name, indices_tables_cell->val) == 0)
+				need_new_slot = false;
+			prev_index_table_name = indices_tables_cell->val;
+			indices_tables_cell = indices_tables_cell->next;
+		}
+
+		if (need_new_slot)
 		{
-			failed = true;
-			goto finish;
+			free_slot = ParallelSlotsGetIdle(sa, NULL);
+			if (!free_slot)
+			{
+				failed = true;
+				goto finish;
+			}
+
+			ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
 		}
 
-		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
 		run_reindex_command(free_slot->connection, process_type, objname,
 							echo, verbose, concurrently, true, tablespace);
 
@@ -436,6 +465,12 @@ finish:
 		pg_free(process_list);
 	}
 
+	if (indices_tables_list)
+	{
+		simple_string_list_destroy(indices_tables_list);
+		pg_free(indices_tables_list);
+	}
+
 	ParallelSlotsTerminate(sa);
 	pfree(sa);
 
@@ -647,8 +682,61 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 			}
 			break;
 
-		case REINDEX_SYSTEM:
 		case REINDEX_INDEX:
+			{
+				SimpleStringListCell *cell;
+
+				Assert(user_list != NULL);
+
+				/*
+				 * Straight-forward index-level REINDEX is not supported with
+				 * multiple jobs as we cannot control the concurrent
+				 * processing of multiple indexes depending on the same
+				 * relation.  But we can extract the appropriate table name
+				 * for the index and put REINDEX INDEX commands into different
+				 * jobs, according to the parent tables.
+				 *
+				 * We will order the results to group the same tables
+				 * together. We fetch index names as well to build a new list
+				 * of them with matching order.
+				 */
+				appendPQExpBufferStr(&catalog_query,
+									 "SELECT t.relname, n.nspname, i.relname\n"
+									 "FROM pg_catalog.pg_index x\n"
+									 "JOIN pg_catalog.pg_class t ON t.oid = x.indrelid\n"
+									 "JOIN pg_catalog.pg_class i ON i.oid = x.indexrelid\n"
+									 "LEFT JOIN pg_catalog.pg_namespace n ON n.oid = t.relnamespace\n"
+									 "WHERE x.indexrelid OPERATOR(pg_catalog.=) ANY(ARRAY['");
+
+				for (cell = user_list->head; cell; cell = cell->next)
+				{
+					if (cell != user_list->head)
+						appendPQExpBufferStr(&catalog_query, "', '");
+
+					appendQualifiedRelation(&catalog_query, cell->val, conn, echo);
+				}
+
+				/*
+				 * Order tables by the size of its greatest index.  Within the
+				 * table, order indexes by their sizes.
+				 */
+				appendPQExpBufferStr(&catalog_query,
+									 "']::pg_catalog.regclass[])\n"
+									 "ORDER BY max(i.relpages) OVER \n"
+									 "    (PARTITION BY n.nspname, t.relname),\n"
+									 "  n.nspname, t.relname, i.relpages;\n");
+
+				/*
+				 * We're going to re-order the user_list to match the order of
+				 * tables.  So, empty the user_list to fill it from the query
+				 * result.
+				 */
+				simple_string_list_destroy(user_list);
+				user_list->head = user_list->tail = NULL;
+			}
+			break;
+
+		case REINDEX_SYSTEM:
 		case REINDEX_TABLE:
 			Assert(false);
 			break;
@@ -680,6 +768,20 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 
 		simple_string_list_append(tables, buf.data);
 		resetPQExpBuffer(&buf);
+
+		if (type == REINDEX_INDEX)
+		{
+			/*
+			 * For index-level REINDEX, rebuild the list of indexes to match
+			 * the order of tables list.
+			 */
+			appendPQExpBufferStr(&buf,
+								 fmtQualifiedId(PQgetvalue(res, i, 1),
+												PQgetvalue(res, i, 2)));
+
+			simple_string_list_append(user_list, buf.data);
+			resetPQExpBuffer(&buf);
+		}
 	}
 	termPQExpBuffer(&buf);
 	PQclear(res);
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index 429dd3acd68..f02822da218 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -236,9 +236,11 @@ $node->safe_psql(
 	CREATE SCHEMA s1;
 	CREATE TABLE s1.t1(id integer);
 	CREATE INDEX ON s1.t1(id);
+	CREATE INDEX i1 ON s1.t1(id);
 	CREATE SCHEMA s2;
 	CREATE TABLE s2.t2(id integer);
 	CREATE INDEX ON s2.t2(id);
+	CREATE INDEX i2 ON s2.t2(id);
 	-- empty schema
 	CREATE SCHEMA s3;
 |);
@@ -246,9 +248,9 @@ $node->safe_psql(
 $node->command_fails(
 	[ 'reindexdb', '-j', '2', '-s', 'postgres' ],
 	'parallel reindexdb cannot process system catalogs');
-$node->command_fails(
-	[ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ],
-	'parallel reindexdb cannot process indexes');
+$node->command_ok(
+	[ 'reindexdb', '-j', '2', '-i', 's1.i1', '-i', 's2.i2', 'postgres' ],
+	'parallel reindexdb for indices');
 # Note that the ordering of the commands is not stable, so the second
 # command for s2.t2 is not checked after.
 $node->issues_sql_like(
-- 
2.39.3 (Apple Git-145)

Tom Lane

tgl@sss.pgh.pa.us

almost 2 years ago

In reply to: Alexander Korotkov (#7)

Re: Add Index-level REINDEX with multiple jobs

Alexander Korotkov <aekorotkov@gmail.com> writes:

Here goes the revised patch. I'm going to push this if there are no objections.

Quite a lot of the buildfarm is complaining about this:

reindexdb.c: In function 'reindex_one_database':
reindexdb.c:434:54: error: 'indices_tables_cell' may be used uninitialized in this function [-Werror=maybe-uninitialized]
434 | strcmp(prev_index_table_name, indices_tables_cell->val) == 0)
| ~~~~~~~~~~~~~~~~~~~^~~~~

I noticed it first on mamba, which is set up with -Werror, but a
scrape of the buildfarm logs shows many other animals reporting this
as a warning. I think they are right. Even granting that the
compiler realizes that "parallel && process_type == REINDEX_INDEX" is
enough to reach the one place where indices_tables_cell is
initialized, that's not really enough, because that place is

if (indices_tables_list)
indices_tables_cell = indices_tables_list->head;

So I believe this code will crash if get_parallel_object_list returns
an empty list. Initializing indices_tables_cell to NULL in its
declaration would stop the compiler warning, but if I'm right it
will do nothing to prevent that crash. This needs a bit more effort.

regards, tom lane

Richard Guo

guofenglinux@gmail.com

almost 2 years ago

In reply to: Tom Lane (#8)

Re: Add Index-level REINDEX with multiple jobs

On Mon, Mar 25, 2024 at 10:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

Here goes the revised patch. I'm going to push this if there are no

objections.

Quite a lot of the buildfarm is complaining about this:

reindexdb.c: In function 'reindex_one_database':
reindexdb.c:434:54: error: 'indices_tables_cell' may be used uninitialized
in this function [-Werror=maybe-uninitialized]
434 | strcmp(prev_index_table_name, indices_tables_cell->val) == 0)
| ~~~~~~~~~~~~~~~~~~~^~~~~

I noticed it first on mamba, which is set up with -Werror, but a
scrape of the buildfarm logs shows many other animals reporting this
as a warning.

I noticed the similar warning on cfbot:
https://cirrus-ci.com/task/6298504306360320?logs=gcc_warning#L448

reindexdb.c: In function ‘reindex_one_database’:
reindexdb.c:437:24: error: ‘indices_tables_cell’ may be used uninitialized
in this function [-Werror=maybe-uninitialized]
437 | indices_tables_cell = indices_tables_cell->next;
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

Although it's complaining on line 437 not 434, I think they are the same
issue.

I think they are right. Even granting that the
compiler realizes that "parallel && process_type == REINDEX_INDEX" is
enough to reach the one place where indices_tables_cell is
initialized, that's not really enough, because that place is

if (indices_tables_list)
indices_tables_cell = indices_tables_list->head;

So I believe this code will crash if get_parallel_object_list returns
an empty list. Initializing indices_tables_cell to NULL in its
declaration would stop the compiler warning, but if I'm right it
will do nothing to prevent that crash. This needs a bit more effort.

Agreed. And the comment of get_parallel_object_list() says that it may
indeed return NULL.

BTW, on line 373, it checks 'process_list' and bails out if this list is
NULL. But it seems to me that 'process_list' cannot be NULL in this
case, because it's initialized to be 'user_list' and we have asserted
that user_list is not NULL on line 360. I wonder if we should check
indices_tables_list instead of process_list on line 373.

Thanks
Richard

#10

Alexander Korotkov

aekorotkov@gmail.com

almost 2 years ago

In reply to: Richard Guo (#9)

Re: Add Index-level REINDEX with multiple jobs

On Mon, Mar 25, 2024 at 4:47 AM Richard Guo <guofenglinux@gmail.com> wrote:

On Mon, Mar 25, 2024 at 10:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Alexander Korotkov <aekorotkov@gmail.com> writes:

Here goes the revised patch. I'm going to push this if there are no objections.

Quite a lot of the buildfarm is complaining about this:

reindexdb.c: In function 'reindex_one_database':
reindexdb.c:434:54: error: 'indices_tables_cell' may be used uninitialized in this function [-Werror=maybe-uninitialized]
434 | strcmp(prev_index_table_name, indices_tables_cell->val) == 0)
| ~~~~~~~~~~~~~~~~~~~^~~~~

I noticed it first on mamba, which is set up with -Werror, but a
scrape of the buildfarm logs shows many other animals reporting this
as a warning.

I noticed the similar warning on cfbot:
https://cirrus-ci.com/task/6298504306360320?logs=gcc_warning#L448

reindexdb.c: In function ‘reindex_one_database’:
reindexdb.c:437:24: error: ‘indices_tables_cell’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
437 | indices_tables_cell = indices_tables_cell->next;
| ~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~

Although it's complaining on line 437 not 434, I think they are the same
issue.

I think they are right. Even granting that the
compiler realizes that "parallel && process_type == REINDEX_INDEX" is
enough to reach the one place where indices_tables_cell is
initialized, that's not really enough, because that place is

if (indices_tables_list)
indices_tables_cell = indices_tables_list->head;

So I believe this code will crash if get_parallel_object_list returns
an empty list. Initializing indices_tables_cell to NULL in its
declaration would stop the compiler warning, but if I'm right it
will do nothing to prevent that crash. This needs a bit more effort.

Agreed. And the comment of get_parallel_object_list() says that it may
indeed return NULL.

BTW, on line 373, it checks 'process_list' and bails out if this list is
NULL. But it seems to me that 'process_list' cannot be NULL in this
case, because it's initialized to be 'user_list' and we have asserted
that user_list is not NULL on line 360. I wonder if we should check
indices_tables_list instead of process_list on line 373.

Thank you. I'm going to deal with this in the next few hours.

------
Regards,
Alexander Korotkov