REINDEX backend filtering

Started by Julien Rouhaudabout 5 years ago40 messages

rjuju123@gmail.com

about 5 years ago

1 attachment(s)

Hello,

Now that we have the infrastructure to track indexes that might be corrupted
due to changes in collation libraries, I think it would be a good idea to offer
an easy way for users to reindex all indexes that might be corrupted.

I'm attaching a POC patch as a discussion basis. It implements a new
"COLLATION" option to reindex, with "not_current" being the only accepted
value. Note that I didn't spent too much efforts on the grammar part yet.

So for instance you can do:

REINDEX (COLLATION 'not_current') DATABASE mydb;

The filter is also implemented so that you could cumulate multiple filters, so
it could be easy to add more filtering, for instance:

REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;

to only rebuild indexes depending on outdated libc collations, or

REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;

to only rebuild indexes depending on a specific version of libc.

Attachments:

v1-0001-Add-a-new-COLLATION-option-to-REINDEX.patchtext/plain; charset=us-asciiDownload

From 5acf42e15c0dc8b185547ff9cb9371a86a057ec9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v1] Add a new COLLATION option to REINDEX.

---
 doc/src/sgml/ref/reindex.sgml              | 13 +++++
 src/backend/catalog/index.c                | 59 +++++++++++++++++++++-
 src/backend/commands/indexcmds.c           | 12 +++--
 src/backend/utils/cache/relcache.c         | 43 ++++++++++++++++
 src/include/catalog/index.h                |  6 ++-
 src/include/utils/relcache.h               |  1 +
 src/test/regress/expected/create_index.out | 10 ++++
 src/test/regress/sql/create_index.sql      | 10 ++++
 8 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 6e1cf06713..eb8da9c070 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -25,6 +25,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
+    COLLATION [ <replaceable class="parameter">text</replaceable> ]
     CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
 </synopsis>
@@ -168,6 +169,18 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>COLLATION</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild.  The
+      only allowed value is <literal>'not_current'</literal>, which will only
+      process indexes that depend on a collation version different than the
+      current one.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>CONCURRENTLY</literal></term>
     <listitem>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 731610c701..7d941f40af 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -99,6 +99,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool deprecated;	/* depends on at least on deprected collation? */
+} IndexHasDeprecatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1349,6 +1355,57 @@ index_check_collation_versions(Oid relid)
 	list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one deprecated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_deprecated_collation(const ObjectAddress *otherObject,
+										const char *version,
+										char **new_version,
+										void *data)
+{
+	IndexHasDeprecatedColl *context = data;
+	char *current_version;
+
+	/* We only care about dependencies on collations. */
+	if (otherObject->classId != CollationRelationId)
+		return false;
+
+	/* Fast exit if we already found a deprecated collation version. */
+	if (context->deprecated)
+		return false;
+
+	/* Ask the provider for the current version.  Give up if unsupported. */
+	current_version = get_collation_version_for_oid(otherObject->objectId);
+	if (!current_version)
+		return false;
+
+	if (!version || strcmp(version, current_version) != 0)
+		context->deprecated = true;
+
+	return false;
+}
+
+bool
+index_has_deprecated_collation(Oid relid)
+{
+	ObjectAddress object;
+	IndexHasDeprecatedColl context;
+
+	object.classId = RelationRelationId;
+	object.objectId = relid;
+	object.objectSubId = 0;
+
+	context.relid = relid;
+	context.deprecated = false;
+
+	visitDependenciesOf(&object, &do_check_index_has_deprecated_collation,
+						&context);
+
+	return context.deprecated;
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3886,7 +3943,7 @@ reindex_relation(Oid relid, int flags, int options)
 	 * relcache to get this with a sequential scan if ignoring system
 	 * indexes.)
 	 */
-	indexIds = RelationGetIndexList(rel);
+	indexIds = RelationGetIndexListFiltered(rel, options);
 
 	if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 14d24b3cc4..2477ad6c74 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2462,6 +2462,7 @@ ReindexParseOptions(ParseState *pstate, ReindexStmt *stmt)
 	int			options = 0;
 	bool		concurrently = false;
 	bool		verbose = false;
+	bool		coll_filter = false;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -2472,6 +2473,9 @@ ReindexParseOptions(ParseState *pstate, ReindexStmt *stmt)
 			verbose = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "concurrently") == 0)
 			concurrently = defGetBoolean(opt);
+		else if ((strcmp(opt->defname, "collation") == 0)
+				 && (strcmp(defGetString(opt), "not_current") == 0))
+			coll_filter = true;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2482,7 +2486,8 @@ ReindexParseOptions(ParseState *pstate, ReindexStmt *stmt)
 
 	options =
 		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+		(concurrently ? REINDEXOPT_CONCURRENTLY : 0) |
+		(coll_filter ? REINDEXOPT_COLL_NOT_CURRENT : 0);
 
 	return options;
 }
@@ -3150,7 +3155,7 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 											  ShareUpdateExclusiveLock);
 
 				/* Add all the valid indexes of relation to list */
-				foreach(lc, RelationGetIndexList(heapRelation))
+				foreach(lc, RelationGetIndexListFiltered(heapRelation, options))
 				{
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
@@ -3196,7 +3201,8 @@ ReindexRelationConcurrently(Oid relationOid, int options)
 
 					MemoryContextSwitchTo(oldcontext);
 
-					foreach(lc2, RelationGetIndexList(toastRelation))
+					foreach(lc2, RelationGetIndexListFiltered(toastRelation,
+															  options))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 66393becfb..9045dce2d7 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4609,6 +4609,49 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+/*
+ * RelationGetIndexListFiltered -- get a filtered list of indexes on this
+ * relation.
+ *
+ * Calls RelationGetIndexList and filters out the result as required by the
+ * caller.  The filters are cumulative, although for now the only supported
+ * filter is for indexes that don't depend on deprecated collation version.
+ */
+List *
+RelationGetIndexListFiltered(Relation relation, int options)
+{
+	List	   *result,
+			   *full_list;
+	ListCell   *lc;
+
+	full_list = RelationGetIndexList(relation);
+
+	/* Fast exit if no filtering was asked, or if the list if empty. */
+	if (!reindexHasFilter(options) || full_list == NIL)
+		return full_list;
+
+	result = NIL;
+	foreach(lc, full_list)
+	{
+		Oid		indexOid = lfirst_oid(lc);
+
+		/*
+		 * The caller wants to discard indexes that don't depend on a
+		 * deprecated collation version.
+		 */
+		if ((options & REINDEXOPT_COLL_NOT_CURRENT) != 0)
+		{
+			if (!index_has_deprecated_collation(indexOid))
+				continue;
+		}
+
+		/* Index passed all filters, keep it */
+		result = lappend_oid(result, indexOid);
+	}
+
+	return result;
+}
+
 /*
  * RelationGetStatExtList
  *		get a list of OIDs of statistics objects on this relation
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index c041628049..9f2bb60343 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -35,9 +35,12 @@ typedef enum ReindexOption
 	REINDEXOPT_VERBOSE = 1 << 0,	/* print progress info */
 	REINDEXOPT_REPORT_PROGRESS = 1 << 1,	/* report pgstat progress */
 	REINDEXOPT_MISSING_OK = 1 << 2, /* skip missing relations */
-	REINDEXOPT_CONCURRENTLY = 1 << 3	/* concurrent mode */
+	REINDEXOPT_CONCURRENTLY = 1 << 3,	/* concurrent mode */
+	REINDEXOPT_COLL_NOT_CURRENT = 1 << 4
 } ReindexOption;
 
+#define reindexHasFilter(x)		((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)
+
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
 {
@@ -131,6 +134,7 @@ extern void FormIndexDatum(IndexInfo *indexInfo,
 						   bool *isnull);
 
 extern void index_check_collation_versions(Oid relid);
+extern bool index_has_deprecated_collation(Oid relid);
 extern void index_update_collation_versions(Oid relid, Oid coll);
 
 extern void index_build(Relation heapRelation,
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 9a85b7dd57..4ef02c9851 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -45,6 +45,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetIndexListFiltered(Relation relation, int options);
 extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index fc6afab58a..39a5b91d1a 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2018,6 +2018,16 @@ INFO:  index "reindex_verbose_pkey" was reindexed
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 --
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+NOTICE:  table "reindex_coll" has no indexes to reindex
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+--
 -- REINDEX CONCURRENTLY
 --
 CREATE TABLE concur_reindex_tab (c1 int);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 824cb9f9e8..99f45c0d02 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -790,6 +790,16 @@ REINDEX (VERBOSE) TABLE reindex_verbose;
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 
+--
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+
 --
 -- REINDEX CONCURRENTLY
 --
-- 
2.20.1

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Julien Rouhaud (#1)

Re: REINDEX backend filtering

On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:

Now that we have the infrastructure to track indexes that might be corrupted
due to changes in collation libraries, I think it would be a good idea to offer
an easy way for users to reindex all indexes that might be corrupted.

Yes. It would be a good thing.

The filter is also implemented so that you could cumulate multiple filters, so
it could be easy to add more filtering, for instance:

REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;

to only rebuild indexes depending on outdated libc collations, or

REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;

to only rebuild indexes depending on a specific version of libc.

Deciding on the grammar to use depends on the use cases we would like
to satisfy. From what I heard on this topic, the goal is to reduce
the amount of time necessary to reindex a system so as REINDEX only
works on indexes whose dependent collation versions are not known or
works on indexes in need of a collation refresh (like a reindexdb
--all --collation -j $jobs). What would be the benefit in having more
complexity with library-dependent settings while we could take care
of the use cases that matter the most with a simple grammar? Perhaps
"not_current" is not the best match as a keyword, we could just use
"collation" and handle that as a boolean. As long as we don't need
new operators in the grammar rules..
--
Michael

Julien Rouhaud

rjuju123@gmail.com

about 5 years ago

In reply to: Michael Paquier (#2)

Re: REINDEX backend filtering

On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:

Now that we have the infrastructure to track indexes that might be corrupted
due to changes in collation libraries, I think it would be a good idea to offer
an easy way for users to reindex all indexes that might be corrupted.

Yes. It would be a good thing.

The filter is also implemented so that you could cumulate multiple filters, so
it could be easy to add more filtering, for instance:

REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;

to only rebuild indexes depending on outdated libc collations, or

REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;

to only rebuild indexes depending on a specific version of libc.

Deciding on the grammar to use depends on the use cases we would like
to satisfy. From what I heard on this topic, the goal is to reduce
the amount of time necessary to reindex a system so as REINDEX only
works on indexes whose dependent collation versions are not known or
works on indexes in need of a collation refresh (like a reindexdb
--all --collation -j $jobs). What would be the benefit in having more
complexity with library-dependent settings while we could take care
of the use cases that matter the most with a simple grammar? Perhaps
"not_current" is not the best match as a keyword, we could just use
"collation" and handle that as a boolean. As long as we don't need
new operators in the grammar rules..

I'm not sure what the DBA usual pattern here. If the reindexing
runtime is really critical, I'm assuming that at least some people
will dig into library details to see what are the collations that
actually broke in the last upgrade and will want to reindex only
those, and force the version for the rest of the indexes. And
obviously, they probably won't wait to have multiple collation
versions dependencies before taking care of that. In that case the
filters that would matters would be one to only keep indexes with an
outdated collation version, and an additional one for a specific
collation name. Or we could have the COLLATION keyword without
additional argument mean all outdated collations, and COLLATION
'collation_name' to specify a specific one. This is maybe a bit ugly,
and would probably require a different approach for reindexdb.

Magnus Hagander

magnus@hagander.net

about 5 years ago

In reply to: Julien Rouhaud (#3)

Re: REINDEX backend filtering

On Tue, Dec 15, 2020 at 12:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Dec 14, 2020 at 3:45 PM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Dec 03, 2020 at 05:31:43PM +0800, Julien Rouhaud wrote:

Now that we have the infrastructure to track indexes that might be corrupted
due to changes in collation libraries, I think it would be a good idea to offer
an easy way for users to reindex all indexes that might be corrupted.

Yes. It would be a good thing.

The filter is also implemented so that you could cumulate multiple filters, so
it could be easy to add more filtering, for instance:

REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;

to only rebuild indexes depending on outdated libc collations, or

REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;

to only rebuild indexes depending on a specific version of libc.

Deciding on the grammar to use depends on the use cases we would like
to satisfy. From what I heard on this topic, the goal is to reduce
the amount of time necessary to reindex a system so as REINDEX only
works on indexes whose dependent collation versions are not known or
works on indexes in need of a collation refresh (like a reindexdb
--all --collation -j $jobs). What would be the benefit in having more
complexity with library-dependent settings while we could take care
of the use cases that matter the most with a simple grammar? Perhaps
"not_current" is not the best match as a keyword, we could just use
"collation" and handle that as a boolean. As long as we don't need
new operators in the grammar rules..

I'm not sure what the DBA usual pattern here. If the reindexing
runtime is really critical, I'm assuming that at least some people
will dig into library details to see what are the collations that
actually broke in the last upgrade and will want to reindex only
those, and force the version for the rest of the indexes. And
obviously, they probably won't wait to have multiple collation
versions dependencies before taking care of that. In that case the
filters that would matters would be one to only keep indexes with an
outdated collation version, and an additional one for a specific
collation name. Or we could have the COLLATION keyword without
additional argument mean all outdated collations, and COLLATION
'collation_name' to specify a specific one. This is maybe a bit ugly,
and would probably require a different approach for reindexdb.

Is this really a common enough operation that we need it i the main grammar?

Having the functionality, definitely, but what if it was "just" a
function instead? So you'd do something like:
SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
\gexec

Or even a function that returns the REINDEX commands directly (taking
a parameter to turn on/off concurrency for example).

That also seems like it would be easier to make flexible, and just as
easy to plug into reindexdb?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Michael Paquier

michael@paquier.xyz

about 5 years ago

In reply to: Magnus Hagander (#4)

Re: REINDEX backend filtering

On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:

Is this really a common enough operation that we need it in the main grammar?

Having the functionality, definitely, but what if it was "just" a
function instead? So you'd do something like:
SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
\gexec

Or even a function that returns the REINDEX commands directly (taking
a parameter to turn on/off concurrency for example).

That also seems like it would be easier to make flexible, and just as
easy to plug into reindexdb?

Having control in the grammar to choose which index to reindex for a
table is very useful when it comes to parallel reindexing, because
it is no-brainer in terms of knowing which index to distribute to one
job or another. In short, with this grammar you can just issue a set
of REINDEX TABLE commands that we know will not conflict with each
other. You cannot get that level of control with REINDEX INDEX as it
may be possible that more or more commands conflict if they work on an
index of the same relation because it is required to take lock also on
the parent table. Of course, we could decide to implement a
redistribution logic in all frontend tools that need such things, like
reindexdb, but that's not something I think we should let the client
decide of. A backend-side filtering is IMO much simpler, less code,
and more elegant.
--
Michael

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#5)

1 attachment(s)

Re: REINDEX backend filtering

On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:

Is this really a common enough operation that we need it in the main grammar?

Having the functionality, definitely, but what if it was "just" a
function instead? So you'd do something like:
SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
\gexec

Or even a function that returns the REINDEX commands directly (taking
a parameter to turn on/off concurrency for example).

That also seems like it would be easier to make flexible, and just as
easy to plug into reindexdb?

Having control in the grammar to choose which index to reindex for a
table is very useful when it comes to parallel reindexing, because
it is no-brainer in terms of knowing which index to distribute to one
job or another. In short, with this grammar you can just issue a set
of REINDEX TABLE commands that we know will not conflict with each
other. You cannot get that level of control with REINDEX INDEX as it
may be possible that more or more commands conflict if they work on an
index of the same relation because it is required to take lock also on
the parent table. Of course, we could decide to implement a
redistribution logic in all frontend tools that need such things, like
reindexdb, but that's not something I think we should let the client
decide of. A backend-side filtering is IMO much simpler, less code,
and more elegant.

Maybe additional filtering capabilities is not something that will be
required frequently, but I'm pretty sure that reindexing only indexes
that might be corrupt is something that will be required often.. So I
agree, having all that logic in the backend makes everything easier
for users, having the choice of the tools they want to issue the query
while still having all features available.

There was a conflict with a3dc926009be8 (Refactor option handling of
CLUSTER, REINDEX and VACUUM), so rebased version attached. No other
changes included yet.

Attachments:

v2-0001-Add-a-new-COLLATION-option-to-REINDEX.patchapplication/octet-stream; name=v2-0001-Add-a-new-COLLATION-option-to-REINDEX.patchDownload

From a6de650a776daf0013d9ad31f1c78ab2a3623771 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v2] Add a new COLLATION option to REINDEX.

---
 doc/src/sgml/ref/reindex.sgml              | 13 +++++
 src/backend/catalog/index.c                | 59 +++++++++++++++++++++-
 src/backend/commands/indexcmds.c           | 13 +++--
 src/backend/utils/cache/relcache.c         | 43 ++++++++++++++++
 src/include/catalog/index.h                |  4 ++
 src/include/utils/relcache.h               |  1 +
 src/test/regress/expected/create_index.out | 10 ++++
 src/test/regress/sql/create_index.sql      | 10 ++++
 8 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 627b36300c..505102482e 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -25,6 +25,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
+    COLLATION [ <replaceable class="parameter">text</replaceable> ]
     CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
 </synopsis>
@@ -168,6 +169,18 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>COLLATION</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild.  The
+      only allowed value is <literal>'not_current'</literal>, which will only
+      process indexes that depend on a collation version different than the
+      current one.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>CONCURRENTLY</literal></term>
     <listitem>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index b8cd35e995..11ed68b8ea 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -99,6 +99,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool deprecated;	/* depends on at least on deprected collation? */
+} IndexHasDeprecatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1349,6 +1355,57 @@ index_check_collation_versions(Oid relid)
 	list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one deprecated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_deprecated_collation(const ObjectAddress *otherObject,
+										const char *version,
+										char **new_version,
+										void *data)
+{
+	IndexHasDeprecatedColl *context = data;
+	char *current_version;
+
+	/* We only care about dependencies on collations. */
+	if (otherObject->classId != CollationRelationId)
+		return false;
+
+	/* Fast exit if we already found a deprecated collation version. */
+	if (context->deprecated)
+		return false;
+
+	/* Ask the provider for the current version.  Give up if unsupported. */
+	current_version = get_collation_version_for_oid(otherObject->objectId);
+	if (!current_version)
+		return false;
+
+	if (!version || strcmp(version, current_version) != 0)
+		context->deprecated = true;
+
+	return false;
+}
+
+bool
+index_has_deprecated_collation(Oid relid)
+{
+	ObjectAddress object;
+	IndexHasDeprecatedColl context;
+
+	object.classId = RelationRelationId;
+	object.objectId = relid;
+	object.objectSubId = 0;
+
+	context.relid = relid;
+	context.deprecated = false;
+
+	visitDependenciesOf(&object, &do_check_index_has_deprecated_collation,
+						&context);
+
+	return context.deprecated;
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3886,7 +3943,7 @@ reindex_relation(Oid relid, int flags, ReindexParams *params)
 	 * relcache to get this with a sequential scan if ignoring system
 	 * indexes.)
 	 */
-	indexIds = RelationGetIndexList(rel);
+	indexIds = RelationGetIndexListFiltered(rel, params->options);
 
 	if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index f9f3ff3b62..1d9886b4e3 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2474,6 +2474,7 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	ListCell   *lc;
 	bool		concurrently = false;
 	bool		verbose = false;
+	bool		coll_filter = false;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -2484,6 +2485,9 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 			verbose = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "concurrently") == 0)
 			concurrently = defGetBoolean(opt);
+		else if ((strcmp(opt->defname, "collation") == 0)
+				 && (strcmp(defGetString(opt), "not_current") == 0))
+			coll_filter = true;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2498,7 +2502,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 
 	params.options =
 		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+		(concurrently ? REINDEXOPT_CONCURRENTLY : 0) |
+		(coll_filter ? REINDEXOPT_COLL_NOT_CURRENT : 0);
 
 	switch (stmt->kind)
 	{
@@ -3211,7 +3216,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 											  ShareUpdateExclusiveLock);
 
 				/* Add all the valid indexes of relation to list */
-				foreach(lc, RelationGetIndexList(heapRelation))
+				foreach(lc, RelationGetIndexListFiltered(heapRelation,
+														 params->options))
 				{
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
@@ -3263,7 +3269,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 
 					MemoryContextSwitchTo(oldcontext);
 
-					foreach(lc2, RelationGetIndexList(toastRelation))
+					foreach(lc2, RelationGetIndexListFiltered(toastRelation,
+															  params->options))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..efef6c5ae5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4620,6 +4620,49 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+/*
+ * RelationGetIndexListFiltered -- get a filtered list of indexes on this
+ * relation.
+ *
+ * Calls RelationGetIndexList and filters out the result as required by the
+ * caller.  The filters are cumulative, although for now the only supported
+ * filter is for indexes that don't depend on deprecated collation version.
+ */
+List *
+RelationGetIndexListFiltered(Relation relation, bits32 options)
+{
+	List	   *result,
+			   *full_list;
+	ListCell   *lc;
+
+	full_list = RelationGetIndexList(relation);
+
+	/* Fast exit if no filtering was asked, or if the list if empty. */
+	if (!reindexHasFilter(options) || full_list == NIL)
+		return full_list;
+
+	result = NIL;
+	foreach(lc, full_list)
+	{
+		Oid		indexOid = lfirst_oid(lc);
+
+		/*
+		 * The caller wants to discard indexes that don't depend on a
+		 * deprecated collation version.
+		 */
+		if ((options & REINDEXOPT_COLL_NOT_CURRENT) != 0)
+		{
+			if (!index_has_deprecated_collation(indexOid))
+				continue;
+		}
+
+		/* Index passed all filters, keep it */
+		result = lappend_oid(result, indexOid);
+	}
+
+	return result;
+}
+
 /*
  * RelationGetStatExtList
  *		get a list of OIDs of statistics objects on this relation
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index 266f8950dc..740c281ddd 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -40,6 +40,9 @@ typedef struct ReindexParams
 #define REINDEXOPT_REPORT_PROGRESS 0x02 /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK 	0x04	/* skip missing relations */
 #define REINDEXOPT_CONCURRENTLY	0x08	/* concurrent mode */
+#define REINDEXOPT_COLL_NOT_CURRENT 0x10/* outdated collation only */
+
+#define reindexHasFilter(x)		((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
@@ -134,6 +137,7 @@ extern void FormIndexDatum(IndexInfo *indexInfo,
 						   bool *isnull);
 
 extern void index_check_collation_versions(Oid relid);
+extern bool index_has_deprecated_collation(Oid relid);
 extern void index_update_collation_versions(Oid relid, Oid coll);
 
 extern void index_build(Relation heapRelation,
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79323..a7a2272abd 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -45,6 +45,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetIndexListFiltered(Relation relation, bits32 options);
 extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index fc6afab58a..39a5b91d1a 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2018,6 +2018,16 @@ INFO:  index "reindex_verbose_pkey" was reindexed
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 --
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+NOTICE:  table "reindex_coll" has no indexes to reindex
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+--
 -- REINDEX CONCURRENTLY
 --
 CREATE TABLE concur_reindex_tab (c1 int);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 824cb9f9e8..99f45c0d02 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -790,6 +790,16 @@ REINDEX (VERBOSE) TABLE reindex_verbose;
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 
+--
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+
 --
 -- REINDEX CONCURRENTLY
 --
-- 
2.20.1

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#6)

1 attachment(s)

Re: REINDEX backend filtering

On Thu, Jan 21, 2021 at 11:12:56AM +0800, Julien Rouhaud wrote:

There was a conflict with a3dc926009be8 (Refactor option handling of
CLUSTER, REINDEX and VACUUM), so rebased version attached. No other
changes included yet.

New conflict, v3 attached.

Attachments:

v3-0001-Add-a-new-COLLATION-option-to-REINDEX.patchtext/plain; charset=us-asciiDownload

From 63afe51453d4413ad7e73c66268e6ff12bfe5436 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v3] Add a new COLLATION option to REINDEX.

---
 doc/src/sgml/ref/reindex.sgml              | 13 +++++
 src/backend/catalog/index.c                | 59 +++++++++++++++++++++-
 src/backend/commands/indexcmds.c           | 13 +++--
 src/backend/utils/cache/relcache.c         | 43 ++++++++++++++++
 src/include/catalog/index.h                |  4 ++
 src/include/utils/relcache.h               |  1 +
 src/test/regress/expected/create_index.out | 10 ++++
 src/test/regress/sql/create_index.sql      | 10 ++++
 8 files changed, 149 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index 07795b5737..ead2904b67 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -25,6 +25,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
+    COLLATION [ <replaceable class="parameter">text</replaceable> ]
     CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
     TABLESPACE <replaceable class="parameter">new_tablespace</replaceable>
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -169,6 +170,18 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>COLLATION</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild.  The
+      only allowed value is <literal>'not_current'</literal>, which will only
+      process indexes that depend on a collation version different than the
+      current one.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>CONCURRENTLY</literal></term>
     <listitem>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1cb9172a5f..b74ee79d38 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -100,6 +100,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool deprecated;	/* depends on at least on deprected collation? */
+} IndexHasDeprecatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1350,6 +1356,57 @@ index_check_collation_versions(Oid relid)
 	list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one deprecated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_deprecated_collation(const ObjectAddress *otherObject,
+										const char *version,
+										char **new_version,
+										void *data)
+{
+	IndexHasDeprecatedColl *context = data;
+	char *current_version;
+
+	/* We only care about dependencies on collations. */
+	if (otherObject->classId != CollationRelationId)
+		return false;
+
+	/* Fast exit if we already found a deprecated collation version. */
+	if (context->deprecated)
+		return false;
+
+	/* Ask the provider for the current version.  Give up if unsupported. */
+	current_version = get_collation_version_for_oid(otherObject->objectId);
+	if (!current_version)
+		return false;
+
+	if (!version || strcmp(version, current_version) != 0)
+		context->deprecated = true;
+
+	return false;
+}
+
+bool
+index_has_deprecated_collation(Oid relid)
+{
+	ObjectAddress object;
+	IndexHasDeprecatedColl context;
+
+	object.classId = RelationRelationId;
+	object.objectId = relid;
+	object.objectSubId = 0;
+
+	context.relid = relid;
+	context.deprecated = false;
+
+	visitDependenciesOf(&object, &do_check_index_has_deprecated_collation,
+						&context);
+
+	return context.deprecated;
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3930,7 +3987,7 @@ reindex_relation(Oid relid, int flags, ReindexParams *params)
 	 * relcache to get this with a sequential scan if ignoring system
 	 * indexes.)
 	 */
-	indexIds = RelationGetIndexList(rel);
+	indexIds = RelationGetIndexListFiltered(rel, params->options);
 
 	if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 127ba7835d..9bf69ff9d7 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2475,6 +2475,7 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	bool		concurrently = false;
 	bool		verbose = false;
 	char	   *tablespacename = NULL;
+	bool		coll_filter = false;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -2487,6 +2488,9 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 			concurrently = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespacename = defGetString(opt);
+		else if ((strcmp(opt->defname, "collation") == 0)
+				 && (strcmp(defGetString(opt), "not_current") == 0))
+			coll_filter = true;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2501,7 +2505,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 
 	params.options =
 		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+		(concurrently ? REINDEXOPT_CONCURRENTLY : 0) |
+		(coll_filter ? REINDEXOPT_COLL_NOT_CURRENT : 0);
 
 	/*
 	 * Assign the tablespace OID to move indexes to, with InvalidOid to do
@@ -3298,7 +3303,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 									RelationGetRelationName(heapRelation))));
 
 				/* Add all the valid indexes of relation to list */
-				foreach(lc, RelationGetIndexList(heapRelation))
+				foreach(lc, RelationGetIndexListFiltered(heapRelation,
+														 params->options))
 				{
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
@@ -3350,7 +3356,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 
 					MemoryContextSwitchTo(oldcontext);
 
-					foreach(lc2, RelationGetIndexList(toastRelation))
+					foreach(lc2, RelationGetIndexListFiltered(toastRelation,
+															  params->options))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..efef6c5ae5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4620,6 +4620,49 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+/*
+ * RelationGetIndexListFiltered -- get a filtered list of indexes on this
+ * relation.
+ *
+ * Calls RelationGetIndexList and filters out the result as required by the
+ * caller.  The filters are cumulative, although for now the only supported
+ * filter is for indexes that don't depend on deprecated collation version.
+ */
+List *
+RelationGetIndexListFiltered(Relation relation, bits32 options)
+{
+	List	   *result,
+			   *full_list;
+	ListCell   *lc;
+
+	full_list = RelationGetIndexList(relation);
+
+	/* Fast exit if no filtering was asked, or if the list if empty. */
+	if (!reindexHasFilter(options) || full_list == NIL)
+		return full_list;
+
+	result = NIL;
+	foreach(lc, full_list)
+	{
+		Oid		indexOid = lfirst_oid(lc);
+
+		/*
+		 * The caller wants to discard indexes that don't depend on a
+		 * deprecated collation version.
+		 */
+		if ((options & REINDEXOPT_COLL_NOT_CURRENT) != 0)
+		{
+			if (!index_has_deprecated_collation(indexOid))
+				continue;
+		}
+
+		/* Index passed all filters, keep it */
+		result = lappend_oid(result, indexOid);
+	}
+
+	return result;
+}
+
 /*
  * RelationGetStatExtList
  *		get a list of OIDs of statistics objects on this relation
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e22d506436..3f08e5f454 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -42,6 +42,9 @@ typedef struct ReindexParams
 #define REINDEXOPT_REPORT_PROGRESS 0x02 /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK 	0x04	/* skip missing relations */
 #define REINDEXOPT_CONCURRENTLY	0x08	/* concurrent mode */
+#define REINDEXOPT_COLL_NOT_CURRENT 0x10/* outdated collation only */
+
+#define reindexHasFilter(x)		((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
@@ -137,6 +140,7 @@ extern void FormIndexDatum(IndexInfo *indexInfo,
 						   bool *isnull);
 
 extern void index_check_collation_versions(Oid relid);
+extern bool index_has_deprecated_collation(Oid relid);
 extern void index_update_collation_versions(Oid relid, Oid coll);
 
 extern void index_build(Relation heapRelation,
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79323..a7a2272abd 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -45,6 +45,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetIndexListFiltered(Relation relation, bits32 options);
 extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index ce734f7ef3..3339fcb831 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2018,6 +2018,16 @@ INFO:  index "reindex_verbose_pkey" was reindexed
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 --
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+NOTICE:  table "reindex_coll" has no indexes to reindex
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+--
 -- REINDEX CONCURRENTLY
 --
 CREATE TABLE concur_reindex_tab (c1 int);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index fd4f30876e..01d193b9de 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -790,6 +790,16 @@ REINDEX (VERBOSE) TABLE reindex_verbose;
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 
+--
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+
 --
 -- REINDEX CONCURRENTLY
 --
-- 
2.20.1

Zhihong Yu

zyu@yugabyte.com

almost 5 years ago

In reply to: Julien Rouhaud (#7)

Re: REINDEX backend filtering

Hi,
For index_has_deprecated_collation(),

+ object.objectSubId = 0;

The objectSubId field is not accessed
by do_check_index_has_deprecated_collation(). Does it need to be assigned ?

For RelationGetIndexListFiltered(), it seems when (options &
REINDEXOPT_COLL_NOT_CURRENT) == 0, the full_list would be returned.
This can be checked prior to entering the foreach loop.

Cheers

On Sat, Feb 6, 2021 at 11:20 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Show quoted text

On Thu, Jan 21, 2021 at 11:12:56AM +0800, Julien Rouhaud wrote:

There was a conflict with a3dc926009be8 (Refactor option handling of
CLUSTER, REINDEX and VACUUM), so rebased version attached. No other
changes included yet.

New conflict, v3 attached.

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Zhihong Yu (#8)

1 attachment(s)

Re: REINDEX backend filtering

Hi,

Thanks for the review!

On Mon, Feb 8, 2021 at 12:14 AM Zhihong Yu <zyu@yugabyte.com> wrote:

Hi,
For index_has_deprecated_collation(),

+ object.objectSubId = 0;

The objectSubId field is not accessed by do_check_index_has_deprecated_collation(). Does it need to be assigned ?

Indeed it's not strictly necessary I think, but it makes things
cleaner and future proof, and that's how things are already done
nearby. So I think it's better to keep it this way.

For RelationGetIndexListFiltered(), it seems when (options & REINDEXOPT_COLL_NOT_CURRENT) == 0, the full_list would be returned.
This can be checked prior to entering the foreach loop.

That's already the case with this test:

/* Fast exit if no filtering was asked, or if the list if empty. */
if (!reindexHasFilter(options) || full_list == NIL)
return full_list;

knowing that

#define reindexHasFilter(x) ((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)

The code as-is written to be extensible with possibly other filters
(e.g. specific library or specific version). Feedback so far seems to
indicate that it may be overkill and only filtering indexes with
deprecated collation is enough. I'll simplify this code in a future
version, getting rid of reindexHasFilter, unless someone thinks more
filter is a good idea.

For now I'm attaching a rebased version, there was a conflict with the
recent patch to add the missing_ok parameter to
get_collation_version_for_oid()

Attachments:

v4-0001-Add-a-new-COLLATION-option-to-REINDEX.patchapplication/octet-stream; name=v4-0001-Add-a-new-COLLATION-option-to-REINDEX.patchDownload

From 1470d3544cffd841305ef34443853efb4d7ca31b Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v4] Add a new COLLATION option to REINDEX.

---
 doc/src/sgml/ref/reindex.sgml              | 13 +++++
 src/backend/catalog/index.c                | 60 +++++++++++++++++++++-
 src/backend/commands/indexcmds.c           | 13 +++--
 src/backend/utils/cache/relcache.c         | 43 ++++++++++++++++
 src/include/catalog/index.h                |  4 ++
 src/include/utils/relcache.h               |  1 +
 src/test/regress/expected/create_index.out | 10 ++++
 src/test/regress/sql/create_index.sql      | 10 ++++
 8 files changed, 150 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index b22d39eba9..3ced85c95c 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -25,6 +25,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
 
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
+    COLLATION [ <replaceable class="parameter">text</replaceable> ]
     CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
     TABLESPACE <replaceable class="parameter">new_tablespace</replaceable>
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
@@ -169,6 +170,18 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>COLLATION</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild.  The
+      only allowed value is <literal>'not_current'</literal>, which will only
+      process indexes that depend on a collation version different than the
+      current one.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>CONCURRENTLY</literal></term>
     <listitem>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4ef61b5efd..36759a97b0 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -100,6 +100,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool deprecated;	/* depends on at least on deprected collation? */
+} IndexHasDeprecatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1351,6 +1357,58 @@ index_check_collation_versions(Oid relid)
 	list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one deprecated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_deprecated_collation(const ObjectAddress *otherObject,
+										const char *version,
+										char **new_version,
+										void *data)
+{
+	IndexHasDeprecatedColl *context = data;
+	char *current_version;
+
+	/* We only care about dependencies on collations. */
+	if (otherObject->classId != CollationRelationId)
+		return false;
+
+	/* Fast exit if we already found a deprecated collation version. */
+	if (context->deprecated)
+		return false;
+
+	/* Ask the provider for the current version.  Give up if unsupported. */
+	current_version = get_collation_version_for_oid(otherObject->objectId,
+													false);
+	if (!current_version)
+		return false;
+
+	if (!version || strcmp(version, current_version) != 0)
+		context->deprecated = true;
+
+	return false;
+}
+
+bool
+index_has_deprecated_collation(Oid relid)
+{
+	ObjectAddress object;
+	IndexHasDeprecatedColl context;
+
+	object.classId = RelationRelationId;
+	object.objectId = relid;
+	object.objectSubId = 0;
+
+	context.relid = relid;
+	context.deprecated = false;
+
+	visitDependenciesOf(&object, &do_check_index_has_deprecated_collation,
+						&context);
+
+	return context.deprecated;
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3991,7 +4049,7 @@ reindex_relation(Oid relid, int flags, ReindexParams *params)
 	 * relcache to get this with a sequential scan if ignoring system
 	 * indexes.)
 	 */
-	indexIds = RelationGetIndexList(rel);
+	indexIds = RelationGetIndexListFiltered(rel, params->options);
 
 	if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8bc652ecd3..d536854576 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2484,6 +2484,7 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	bool		concurrently = false;
 	bool		verbose = false;
 	char	   *tablespacename = NULL;
+	bool		coll_filter = false;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -2496,6 +2497,9 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 			concurrently = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespacename = defGetString(opt);
+		else if ((strcmp(opt->defname, "collation") == 0)
+				 && (strcmp(defGetString(opt), "not_current") == 0))
+			coll_filter = true;
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2510,7 +2514,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 
 	params.options =
 		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+		(concurrently ? REINDEXOPT_CONCURRENTLY : 0) |
+		(coll_filter ? REINDEXOPT_COLL_NOT_CURRENT : 0);
 
 	/*
 	 * Assign the tablespace OID to move indexes to, with InvalidOid to do
@@ -3307,7 +3312,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 									RelationGetRelationName(heapRelation))));
 
 				/* Add all the valid indexes of relation to list */
-				foreach(lc, RelationGetIndexList(heapRelation))
+				foreach(lc, RelationGetIndexListFiltered(heapRelation,
+														 params->options))
 				{
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
@@ -3359,7 +3365,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 
 					MemoryContextSwitchTo(oldcontext);
 
-					foreach(lc2, RelationGetIndexList(toastRelation))
+					foreach(lc2, RelationGetIndexListFiltered(toastRelation,
+															  params->options))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..efef6c5ae5 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4620,6 +4620,49 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+/*
+ * RelationGetIndexListFiltered -- get a filtered list of indexes on this
+ * relation.
+ *
+ * Calls RelationGetIndexList and filters out the result as required by the
+ * caller.  The filters are cumulative, although for now the only supported
+ * filter is for indexes that don't depend on deprecated collation version.
+ */
+List *
+RelationGetIndexListFiltered(Relation relation, bits32 options)
+{
+	List	   *result,
+			   *full_list;
+	ListCell   *lc;
+
+	full_list = RelationGetIndexList(relation);
+
+	/* Fast exit if no filtering was asked, or if the list if empty. */
+	if (!reindexHasFilter(options) || full_list == NIL)
+		return full_list;
+
+	result = NIL;
+	foreach(lc, full_list)
+	{
+		Oid		indexOid = lfirst_oid(lc);
+
+		/*
+		 * The caller wants to discard indexes that don't depend on a
+		 * deprecated collation version.
+		 */
+		if ((options & REINDEXOPT_COLL_NOT_CURRENT) != 0)
+		{
+			if (!index_has_deprecated_collation(indexOid))
+				continue;
+		}
+
+		/* Index passed all filters, keep it */
+		result = lappend_oid(result, indexOid);
+	}
+
+	return result;
+}
+
 /*
  * RelationGetStatExtList
  *		get a list of OIDs of statistics objects on this relation
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e22d506436..3f08e5f454 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -42,6 +42,9 @@ typedef struct ReindexParams
 #define REINDEXOPT_REPORT_PROGRESS 0x02 /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK 	0x04	/* skip missing relations */
 #define REINDEXOPT_CONCURRENTLY	0x08	/* concurrent mode */
+#define REINDEXOPT_COLL_NOT_CURRENT 0x10/* outdated collation only */
+
+#define reindexHasFilter(x)		((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
@@ -137,6 +140,7 @@ extern void FormIndexDatum(IndexInfo *indexInfo,
 						   bool *isnull);
 
 extern void index_check_collation_versions(Oid relid);
+extern bool index_has_deprecated_collation(Oid relid);
 extern void index_update_collation_versions(Oid relid, Oid coll);
 
 extern void index_build(Relation heapRelation,
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79323..a7a2272abd 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -45,6 +45,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetIndexListFiltered(Relation relation, bits32 options);
 extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 830fdddf24..1c9f22ee37 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2018,6 +2018,16 @@ INFO:  index "reindex_verbose_pkey" was reindexed
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 --
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+NOTICE:  table "reindex_coll" has no indexes to reindex
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+--
 -- REINDEX CONCURRENTLY
 --
 CREATE TABLE concur_reindex_tab (c1 int);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 8bc76f7c6f..fa9e05dba4 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -790,6 +790,16 @@ REINDEX (VERBOSE) TABLE reindex_verbose;
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 
+--
+-- REINDEX (COLLATION 'not_current')
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (COLLATION 'not_current') TABLE reindex_coll;
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+
 --
 -- REINDEX CONCURRENTLY
 --
-- 
2.20.1

#10

mariakatosvich

loveneet.singh@redblink.com

almost 5 years ago

In reply to: Magnus Hagander (#4)

Re: REINDEX backend filtering

From what I heard on this topic, the goal is to reduce
the amount of time necessary to reindex a system so as REINDEX only
works on indexes whose dependent collation versions are not known or
works on indexes in need of a collation refresh (like a reindexdb
--all --collation -j $jobs).

--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

#11

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: mariakatosvich (#10)

Re: REINDEX backend filtering

Hi,

On Thu, Feb 25, 2021 at 12:11 AM mariakatosvich
<loveneet.singh@redblink.com> wrote:

From what I heard on this topic, the goal is to reduce
the amount of time necessary to reindex a system so as REINDEX only
works on indexes whose dependent collation versions are not known or
works on indexes in need of a collation refresh (like a reindexdb
--all --collation -j $jobs).

That's indeed the goal. The current patch only adds infrastructure
for the REINDEX command, which will make easy to add the option for
reindexdb. I'll add the reindexdb part too in the next version of the
patch.

#12

Darafei "Komяpa" Praliaskouski

me@komzpa.net

almost 5 years ago

In reply to: Julien Rouhaud (#1)

Re: REINDEX backend filtering

Hello,

The PostGIS project needed this from time to time. Would be great if
reindex by opclass can be made possible.

We changed the semantics of btree at least twice (in 2.4 and 3.0), fixed
some ND mixed-dimension indexes semantics in 3.0, fixed hash index on 32
bit arch in 3.0.

On Thu, Dec 3, 2020 at 12:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

Hello,

Now that we have the infrastructure to track indexes that might be
corrupted
due to changes in collation libraries, I think it would be a good idea to
offer
an easy way for users to reindex all indexes that might be corrupted.

I'm attaching a POC patch as a discussion basis. It implements a new
"COLLATION" option to reindex, with "not_current" being the only accepted
value. Note that I didn't spent too much efforts on the grammar part yet.

So for instance you can do:

REINDEX (COLLATION 'not_current') DATABASE mydb;

The filter is also implemented so that you could cumulate multiple
filters, so
it could be easy to add more filtering, for instance:

REINDEX (COLLATION 'libc', COLLATION 'not_current') DATABASE mydb;

to only rebuild indexes depending on outdated libc collations, or

REINDEX (COLLATION 'libc', VERSION 'X.Y') DATABASE mydb;

to only rebuild indexes depending on a specific version of libc.

--
Darafei "Komяpa" Praliaskouski
OSM BY Team - http://openstreetmap.by/

#13

Thomas Munro

thomas.munro@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#9)

Re: REINDEX backend filtering

On Thu, Feb 25, 2021 at 1:22 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

#define reindexHasFilter(x) ((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)

It's better to use "(x) & ..." in macros to avoid weird operator
precedence problems in future code.

It seems like there are several different names for similar things in
this patch: "outdated", "not current", "deprecated". Can we settle on
one, maybe "outdated"?

The code as-is written to be extensible with possibly other filters
(e.g. specific library or specific version). Feedback so far seems to
indicate that it may be overkill and only filtering indexes with
deprecated collation is enough. I'll simplify this code in a future
version, getting rid of reindexHasFilter, unless someone thinks more
filter is a good idea.

Hmm, yeah, I think it should probably be very general. Suppose someone
invents versioned dependencies for (say) functions or full text
stemmers etc, then do we want to add more syntax here to rebuild
indexes (assuming we don't use INVALID for such cases, IDK)? I don't
think we'd want to have more syntax just to be able to say "hey,
please fix my collation problems but not my stemmer problems". What
about just REINDEX (OUTDATED)? It's hard to find a single word that
means "depends on an object whose version has changed".

#14

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Darafei "Komяpa" Praliaskouski (#12)

Re: REINDEX backend filtering

Hi,

On Wed, Feb 24, 2021 at 09:34:59PM +0300, Darafei "Komяpa" Praliaskouski wrote:

Hello,

The PostGIS project needed this from time to time. Would be great if
reindex by opclass can be made possible.

We changed the semantics of btree at least twice (in 2.4 and 3.0), fixed
some ND mixed-dimension indexes semantics in 3.0, fixed hash index on 32
bit arch in 3.0.

Oh, I wasn't aware of that. Thanks for bringing this up!

Looking at the last occurence (c00f9525a3c6c) I see that the NEWS item does
mention the need to do a REINDEX. As far as I understand there wouldn't be any
hard error if one doesn't do to a REINDEX, and you'd end up with some kind
of "logical" corruption as the new lib version won't have the same semantics
depending on the number of dimensions, so more or less the same kind of
problems that would happen in case of breaking update of a collation library.

It seems to me that it's a legitimate use case, especially since GiST doesn't
have a metapage to store an index version. But I'm wondering if the right
answer is to allow REINDEX / reindexdb to look for specific opclass or to add
an API to let third party code register a custom dependency. In this case
it would be some kind of gist ABI versioning. We could then have a single
REINDEX option, like REINDEX (OUTDATED) as Thomas suggested in
/messages/by-id/CA+hUKG+WWioP6xV5Xf1pPhiWNGD1B7hdBBCdQoKfp=zymJajBQ@mail.gmail.com
for both cases.

#15

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Thomas Munro (#13)

Re: REINDEX backend filtering

On Thu, Feb 25, 2021 at 07:36:02AM +1300, Thomas Munro wrote:

On Thu, Feb 25, 2021 at 1:22 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

#define reindexHasFilter(x) ((x & REINDEXOPT_COLL_NOT_CURRENT) != 0)

It's better to use "(x) & ..." in macros to avoid weird operator
precedence problems in future code.

Ah indeed, thanks! I usually always protect the arguments wth parenthesis but
I somehow missed this one. I'll send a new version of the patch shortly with
the rest of the problems you mentioned.

It seems like there are several different names for similar things in
this patch: "outdated", "not current", "deprecated". Can we settle on
one, maybe "outdated"?

Oops, I apparently missed a lot of places during the various rewrite of the
patch. +1 for oudated.

The code as-is written to be extensible with possibly other filters
(e.g. specific library or specific version). Feedback so far seems to
indicate that it may be overkill and only filtering indexes with
deprecated collation is enough. I'll simplify this code in a future
version, getting rid of reindexHasFilter, unless someone thinks more
filter is a good idea.

Hmm, yeah, I think it should probably be very general. Suppose someone
invents versioned dependencies for (say) functions or full text
stemmers etc, then do we want to add more syntax here to rebuild
indexes (assuming we don't use INVALID for such cases, IDK)? I don't
think we'd want to have more syntax just to be able to say "hey,
please fix my collation problems but not my stemmer problems". What
about just REINDEX (OUTDATED)? It's hard to find a single word that
means "depends on an object whose version has changed".

That quite make sense. I agree that it would make the solution simpler and
better.

Looking at the other use case for PostGIS mentioned by Darafei, it seems that
it would help to make concept of index dependency extensible for third party
code too (see
/messages/by-id/20210226074531.dhkfneao2czzqk6n@nol).
Would that make sense?

#16

Magnus Hagander

magnus@hagander.net

almost 5 years ago

In reply to: Michael Paquier (#5)

Re: REINDEX backend filtering

On Wed, Dec 16, 2020 at 1:27 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:

Is this really a common enough operation that we need it in the main grammar?

Having the functionality, definitely, but what if it was "just" a
function instead? So you'd do something like:
SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
\gexec

Or even a function that returns the REINDEX commands directly (taking
a parameter to turn on/off concurrency for example).

That also seems like it would be easier to make flexible, and just as
easy to plug into reindexdb?

Having control in the grammar to choose which index to reindex for a
table is very useful when it comes to parallel reindexing, because
it is no-brainer in terms of knowing which index to distribute to one
job or another. In short, with this grammar you can just issue a set
of REINDEX TABLE commands that we know will not conflict with each
other. You cannot get that level of control with REINDEX INDEX as it

(oops, seems I forgot to reply to this one, sorry!)

Uh, isn't it almost exactly the opposite?

If you do it in the backend grammar you *cannot* parallelize it
between indexes, because you can only run one at a time.

Whereas if you do it in the frontend, you can. Either with something
like reindexdb that could do it automatically, or with psql as a
copy/paste job?

may be possible that more or more commands conflict if they work on an
index of the same relation because it is required to take lock also on
the parent table. Of course, we could decide to implement a
redistribution logic in all frontend tools that need such things, like
reindexdb, but that's not something I think we should let the client
decide of. A backend-side filtering is IMO much simpler, less code,
and more elegant.

It's simpler in the simple case, i agree with that. But ISTM it's also
a lot less flexible for anything except the simplest case...

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#17

Magnus Hagander

magnus@hagander.net

almost 5 years ago

In reply to: Julien Rouhaud (#6)

Re: REINDEX backend filtering

On Thu, Jan 21, 2021 at 4:13 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Wed, Dec 16, 2020 at 8:27 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Dec 15, 2020 at 06:34:16PM +0100, Magnus Hagander wrote:

Is this really a common enough operation that we need it in the main grammar?

Having the functionality, definitely, but what if it was "just" a
function instead? So you'd do something like:
SELECT 'reindex index ' || i FROM pg_blah(some, arguments, here)
\gexec

Or even a function that returns the REINDEX commands directly (taking
a parameter to turn on/off concurrency for example).

That also seems like it would be easier to make flexible, and just as
easy to plug into reindexdb?

Having control in the grammar to choose which index to reindex for a
table is very useful when it comes to parallel reindexing, because
it is no-brainer in terms of knowing which index to distribute to one
job or another. In short, with this grammar you can just issue a set
of REINDEX TABLE commands that we know will not conflict with each
other. You cannot get that level of control with REINDEX INDEX as it
may be possible that more or more commands conflict if they work on an
index of the same relation because it is required to take lock also on
the parent table. Of course, we could decide to implement a
redistribution logic in all frontend tools that need such things, like
reindexdb, but that's not something I think we should let the client
decide of. A backend-side filtering is IMO much simpler, less code,
and more elegant.

Maybe additional filtering capabilities is not something that will be
required frequently, but I'm pretty sure that reindexing only indexes
that might be corrupt is something that will be required often.. So I
agree, having all that logic in the backend makes everything easier
for users, having the choice of the tools they want to issue the query
while still having all features available.

I agree with that scenario -- in that the most common case will be
exactly that of reindexing only indexes that might be corrupt.

I don't agree with the conclusion though.

The most common case of that will be in the case of an upgrade. In
that case I want to reindex all of those indexes as quickly as
possible. So I'll want to parallelize it across multiple sessions
(like reindexdb -j 4 or whatever). But doesn't putting the filter in
the grammar prevent me from doing exactly that? Each of those 4 (or
whatever) sessions would have to guess which would go where and then
speculatively run the command on that, instead of being able to
directly distribute the worload?

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#18

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Magnus Hagander (#17)

Re: REINDEX backend filtering

On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander <magnus@hagander.net> wrote:

I don't agree with the conclusion though.

The most common case of that will be in the case of an upgrade. In
that case I want to reindex all of those indexes as quickly as
possible. So I'll want to parallelize it across multiple sessions
(like reindexdb -j 4 or whatever). But doesn't putting the filter in
the grammar prevent me from doing exactly that? Each of those 4 (or
whatever) sessions would have to guess which would go where and then
speculatively run the command on that, instead of being able to
directly distribute the worload?

It means that you'll have to distribute the work on a per-table basis
rather than a per-index basis. The time spent to find out that a
table doesn't have any impacted index should be negligible compared to
the cost of running a reindex. This obviously won't help that much if
you have a lot of table but only one being gigantic.

But even if we put the logic in the client, this still won't help as
reindexdb doesn't support multiple job with an index list:

* Index-level REINDEX is not supported with multiple jobs as we
* cannot control the concurrent processing of multiple indexes
* depending on the same relation.
*/
if (concurrentCons > 1 && indexes.head != NULL)
{
pg_log_error("cannot use multiple jobs to reindex indexes");
exit(1);
}

#19

Magnus Hagander

magnus@hagander.net

almost 5 years ago

In reply to: Julien Rouhaud (#18)

Re: REINDEX backend filtering

On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Fri, Feb 26, 2021 at 5:50 PM Magnus Hagander <magnus@hagander.net> wrote:

I don't agree with the conclusion though.

The most common case of that will be in the case of an upgrade. In
that case I want to reindex all of those indexes as quickly as
possible. So I'll want to parallelize it across multiple sessions
(like reindexdb -j 4 or whatever). But doesn't putting the filter in
the grammar prevent me from doing exactly that? Each of those 4 (or
whatever) sessions would have to guess which would go where and then
speculatively run the command on that, instead of being able to
directly distribute the worload?

It means that you'll have to distribute the work on a per-table basis
rather than a per-index basis. The time spent to find out that a
table doesn't have any impacted index should be negligible compared to
the cost of running a reindex. This obviously won't help that much if
you have a lot of table but only one being gigantic.

Yeah -- or at least a couple of large and many small, which I find to
be a very common scenario. Or the case of some tables having many
affected indexes and some having few.

You'd basically want to order the operation by table on something like
"total size of the affected indexes on table x" -- which may very well
put a smaller table with many indexes earlier in the queue. But you
can't do that without having access to the filter....

But even if we put the logic in the client, this still won't help as
reindexdb doesn't support multiple job with an index list:

* Index-level REINDEX is not supported with multiple jobs as we
* cannot control the concurrent processing of multiple indexes
* depending on the same relation.
*/
if (concurrentCons > 1 && indexes.head != NULL)
{
pg_log_error("cannot use multiple jobs to reindex indexes");
exit(1);
}

That sounds like it would be a fixable problem though, in principle.
It could/should probably still limit all indexes on the same table to
be processed in the same connection for the locking reasons of course,
but doing an order by the total size of the indexes like above, and
ensuring that they are grouped that way, doesn't sound *that* hard. I
doubt it's that important in the current usecase of manually listing
the indexes, but it would be useful for something like this.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

#20

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Magnus Hagander (#19)

Re: REINDEX backend filtering

On Fri, Feb 26, 2021 at 11:17:26AM +0100, Magnus Hagander wrote:

On Fri, Feb 26, 2021 at 11:07 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

It means that you'll have to distribute the work on a per-table basis
rather than a per-index basis. The time spent to find out that a
table doesn't have any impacted index should be negligible compared to
the cost of running a reindex. This obviously won't help that much if
you have a lot of table but only one being gigantic.

Yeah -- or at least a couple of large and many small, which I find to
be a very common scenario. Or the case of some tables having many
affected indexes and some having few.

You'd basically want to order the operation by table on something like
"total size of the affected indexes on table x" -- which may very well
put a smaller table with many indexes earlier in the queue. But you
can't do that without having access to the filter....

So, long running reindex due to some gigantic and/or numerous indexes on a
single (or few) table is not something that we can solve, but inefficient
reindex due to wrong table size / to-be-reindexed-indexes-size correlation can
be addressed.

I would still prefer to go to backend implementation, so that all client tools
can benefit from it by default. We could simply export the current
index_has_oudated_collation(oid) function in sql, and tweak pg_dump to order
tables by the cumulated size of such indexes as you mentioned below, would
that work for you?

Also, given Thomas proposal in a nearby email this function would be renamed to
index_has_oudated_dependencies(oid) or something like that.

But even if we put the logic in the client, this still won't help as
reindexdb doesn't support multiple job with an index list:

* Index-level REINDEX is not supported with multiple jobs as we
* cannot control the concurrent processing of multiple indexes
* depending on the same relation.
*/
if (concurrentCons > 1 && indexes.head != NULL)
{
pg_log_error("cannot use multiple jobs to reindex indexes");
exit(1);
}

That sounds like it would be a fixable problem though, in principle.
It could/should probably still limit all indexes on the same table to
be processed in the same connection for the locking reasons of course,
but doing an order by the total size of the indexes like above, and
ensuring that they are grouped that way, doesn't sound *that* hard. I
doubt it's that important in the current usecase of manually listing
the indexes, but it would be useful for something like this.

Yeah, I don't think that in case of oudated dependency the --index will be
useful, it's likely that there will be too many indexes to process. We can
still try to improve reindexdb to be able to process index lists with parallel
connections, but I would rather keep that separated from this patch.

#21

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#20)

2 attachment(s)

Re: REINDEX backend filtering

On Tue, Mar 02, 2021 at 12:01:55PM +0800, Julien Rouhaud wrote:

So, long running reindex due to some gigantic and/or numerous indexes on a
single (or few) table is not something that we can solve, but inefficient
reindex due to wrong table size / to-be-reindexed-indexes-size correlation can
be addressed.

I would still prefer to go to backend implementation, so that all client tools
can benefit from it by default. We could simply export the current
index_has_oudated_collation(oid) function in sql, and tweak pg_dump to order
tables by the cumulated size of such indexes as you mentioned below, would
that work for you?

Also, given Thomas proposal in a nearby email this function would be renamed to
index_has_oudated_dependencies(oid) or something like that.

Please find attached v5 which address all previous comments:

- consistently use "outdated"
- use REINDEX (OUTDATED) grammar (with a new unreserved OUTDATED keyword)
- new --outdated option to reindexdb
- expose a new "pg_index_has_outdated_dependency(regclass)" SQL function
- use that function in reindexdb --outdated to sort tables by total
indexes-to-be-processed size

Attachments:

v5-0001-Add-a-new-OUTDATED-filtering-facility-for-REINDEX.patchtext/x-diff; charset=us-asciiDownload

From 5703ce209d414dd7a6fba18f581eca4671364834 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v5 1/2] Add a new OUTDATED filtering facility for REINDEX
 command.

OUTDATED is added a new unreserved keyword.

When used, REINDEX will only process indexes that have an outdated dependency.
For now, only dependency on collations are supported but we'll likely support
other kind of dependency in the future.

Also add a new pg_index_has_outdated_dependency(regclass) SQL function, so
client code can filter such indexes if needed.  This function will also be used
in a following commit to teach reindexdb to use this new OUTDATED option and
order the tables by the amount of work that will actually be done.

Catversion (should be) bumped.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by:
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 doc/src/sgml/func.sgml                     |  27 ++++--
 doc/src/sgml/ref/reindex.sgml              |  12 +++
 src/backend/catalog/index.c                | 107 ++++++++++++++++++++-
 src/backend/commands/indexcmds.c           |  12 ++-
 src/backend/parser/gram.y                  |   4 +-
 src/backend/utils/cache/relcache.c         |  40 ++++++++
 src/bin/psql/tab-complete.c                |   2 +-
 src/include/catalog/index.h                |   3 +
 src/include/catalog/pg_proc.dat            |   4 +
 src/include/parser/kwlist.h                |   1 +
 src/include/utils/relcache.h               |   1 +
 src/test/regress/expected/create_index.out |  10 ++
 src/test/regress/sql/create_index.sql      |  10 ++
 13 files changed, 221 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bf99f82149..2cf6e66234 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26381,12 +26381,13 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
 
    <para>
     <xref linkend="functions-admin-index-table"/> shows the functions
-    available for index maintenance tasks.  (Note that these maintenance
-    tasks are normally done automatically by autovacuum; use of these
-    functions is only required in special cases.)
-    These functions cannot be executed during recovery.
-    Use of these functions is restricted to superusers and the owner
-    of the given index.
+    available for index maintenance tasks.  (Note that the maintenance
+    tasks performing actions on indexes are normally done automatically by
+    autovacuum; use of these functions is only required in special cases.)
+    The functions performing actions on indexes cannot be executed during
+    recovery.
+    Use of the functions performing actions on indexes is restricted to
+    superusers and the owner of the given index.
    </para>
 
    <table id="functions-admin-index-table">
@@ -26471,6 +26472,20 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
         option.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_index_has_outdated_dependency</primary>
+        </indexterm>
+        <function>pg_index_has_outdated_dependency</function> ( <parameter>index</parameter> <type>regclass</type> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Check if the specified index has any outdated dependency.  For now only
+        dependency on collations are supported.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index b22d39eba9..2d94d49cde 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -26,6 +26,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
     CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
+    OUTDATED [ <replaceable class="parameter">boolean</replaceable> ]
     TABLESPACE <replaceable class="parameter">new_tablespace</replaceable>
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
 </synopsis>
@@ -188,6 +189,17 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>OUTDATED</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild and only
+      process indexes that have outdated dependencies.  Fow now, the only
+      handle dependency is for the collation provider version.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>TABLESPACE</literal></term>
     <listitem>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4ef61b5efd..571feac5db 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -100,6 +100,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool outdated;	/* depends on at least on deprected collation? */
+} IndexHasOutdatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1351,6 +1357,105 @@ index_check_collation_versions(Oid relid)
 	list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one outdated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_outdated_collation(const ObjectAddress *otherObject,
+										const char *version,
+										char **new_version,
+										void *data)
+{
+	IndexHasOutdatedColl *context = data;
+	char *current_version;
+
+	/* We only care about dependencies on collations. */
+	if (otherObject->classId != CollationRelationId)
+		return false;
+
+	/* Fast exit if we already found a outdated collation version. */
+	if (context->outdated)
+		return false;
+
+	/* Ask the provider for the current version.  Give up if unsupported. */
+	current_version = get_collation_version_for_oid(otherObject->objectId,
+													false);
+	if (!current_version)
+		return false;
+
+	if (!version || strcmp(version, current_version) != 0)
+		context->outdated = true;
+
+	return false;
+}
+
+Datum
+pg_index_has_outdated_dependency(PG_FUNCTION_ARGS)
+{
+	Oid			indexOid = PG_GETARG_OID(0);
+	Relation	rel;
+	bool		isIndex;
+	bool		res;
+
+	rel = try_relation_open(indexOid, AccessShareLock);
+
+	if (rel == NULL)
+		PG_RETURN_NULL();
+
+	isIndex = rel->rd_rel->relkind == RELKIND_INDEX;
+
+	if (!isIndex)
+	{
+		relation_close(rel, AccessShareLock);
+		PG_RETURN_NULL();
+	}
+
+	res = index_has_outdated_dependency(indexOid);
+
+	relation_close(rel, AccessShareLock);
+
+	PG_RETURN_BOOL(res);
+}
+
+/*
+ * Check whether the given index has a dependency with an outdated
+ * collation version.
+ * Caller must hold a suitable lock and make sure that the given Oid belongs to
+ * an index.
+ */
+bool
+index_has_outdated_collation(Oid indexOid)
+{
+	ObjectAddress object;
+	IndexHasOutdatedColl context;
+
+	object.classId = RelationRelationId;
+	object.objectId = indexOid;
+	object.objectSubId = 0;
+
+	context.relid = indexOid;
+	context.outdated = false;
+
+	visitDependenciesOf(&object, &do_check_index_has_outdated_collation,
+						&context);
+
+	return context.outdated;
+}
+
+/*
+ * Check whether the given index has a dependency with an outdated
+ * refobjversion.
+ * Caller must hold a suitable lock and make sure that the given Oid belongs to
+ * an index.
+ * For now, only dependency on collations are supported.
+ */
+bool
+index_has_outdated_dependency(Oid indexOid)
+{
+	return index_has_outdated_collation(indexOid);
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3991,7 +4096,7 @@ reindex_relation(Oid relid, int flags, ReindexParams *params)
 	 * relcache to get this with a sequential scan if ignoring system
 	 * indexes.)
 	 */
-	indexIds = RelationGetIndexList(rel);
+	indexIds = RelationGetIndexListFiltered(rel, params->options);
 
 	if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8bc652ecd3..0e1bcad101 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2484,6 +2484,7 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	bool		concurrently = false;
 	bool		verbose = false;
 	char	   *tablespacename = NULL;
+	bool		outdated_filter = false;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -2496,6 +2497,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 			concurrently = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespacename = defGetString(opt);
+		else if (strcmp(opt->defname, "outdated") == 0)
+			outdated_filter = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2510,7 +2513,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 
 	params.options =
 		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+		(concurrently ? REINDEXOPT_CONCURRENTLY : 0) |
+		(outdated_filter ? REINDEXOPT_OUTDATED : 0);
 
 	/*
 	 * Assign the tablespace OID to move indexes to, with InvalidOid to do
@@ -3307,7 +3311,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 									RelationGetRelationName(heapRelation))));
 
 				/* Add all the valid indexes of relation to list */
-				foreach(lc, RelationGetIndexList(heapRelation))
+				foreach(lc, RelationGetIndexListFiltered(heapRelation,
+														 params->options))
 				{
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
@@ -3359,7 +3364,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 
 					MemoryContextSwitchTo(oldcontext);
 
-					foreach(lc2, RelationGetIndexList(toastRelation))
+					foreach(lc2, RelationGetIndexListFiltered(toastRelation,
+															  params->options))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 652be0b96d..a67cfa867c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -674,7 +674,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	NULLS_P NUMERIC
 
 	OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR
-	ORDER ORDINALITY OTHERS OUT_P OUTER_P
+	ORDER ORDINALITY OTHERS OUT_P OUTDATED OUTER_P
 	OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER
 
 	PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY
@@ -15430,6 +15430,7 @@ unreserved_keyword:
 			| OPTIONS
 			| ORDINALITY
 			| OTHERS
+			| OUTDATED
 			| OVER
 			| OVERRIDING
 			| OWNED
@@ -16002,6 +16003,7 @@ bare_label_keyword:
 			| ORDINALITY
 			| OTHERS
 			| OUT_P
+			| OUTDATED
 			| OUTER_P
 			| OVERLAY
 			| OVERRIDING
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..06545ee550 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4620,6 +4620,46 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+/*
+ * RelationGetIndexListFiltered -- get a filtered list of indexes on this
+ * relation.
+ *
+ * Calls RelationGetIndexList and only keep indexes that have an outdated
+ * dependency.  For now, only collation version dependency is supported.
+ */
+List *
+RelationGetIndexListFiltered(Relation relation, bits32 options)
+{
+	List	   *result,
+			   *full_list;
+	ListCell   *lc;
+
+	full_list = RelationGetIndexList(relation);
+
+	/* Fast exit if no filtering was asked, or if the list if empty. */
+	if (((options & REINDEXOPT_OUTDATED) == 0) || full_list == NIL)
+		return full_list;
+
+	result = NIL;
+	foreach(lc, full_list)
+	{
+		Oid		indexOid = lfirst_oid(lc);
+
+		/*
+		 * Check for outdated collation version dependency.
+		 */
+		if (index_has_outdated_collation(indexOid))
+		{
+			result = lappend_oid(result, indexOid);
+			continue;
+		}
+
+		/* Didn't find any outdated dependency, index will be ignored. */
+	}
+
+	return result;
+}
+
 /*
  * RelationGetStatExtList
  *		get a list of OIDs of statistics objects on this relation
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 9f0208ac49..fac3e0476a 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3674,7 +3674,7 @@ psql_completion(const char *text, int start, int end)
 		 * one word, so the above test is correct.
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-			COMPLETE_WITH("CONCURRENTLY", "TABLESPACE", "VERBOSE");
+			COMPLETE_WITH("CONCURRENTLY", "OUTDATED'", "TABLESPACE", "VERBOSE");
 		else if (TailMatches("TABLESPACE"))
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e22d506436..298c0c633c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -42,6 +42,7 @@ typedef struct ReindexParams
 #define REINDEXOPT_REPORT_PROGRESS 0x02 /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK 	0x04	/* skip missing relations */
 #define REINDEXOPT_CONCURRENTLY	0x08	/* concurrent mode */
+#define REINDEXOPT_OUTDATED		0x10/* outdated collation only */
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
@@ -137,6 +138,8 @@ extern void FormIndexDatum(IndexInfo *indexInfo,
 						   bool *isnull);
 
 extern void index_check_collation_versions(Oid relid);
+extern bool index_has_outdated_collation(Oid indexOid);
+extern bool index_has_outdated_dependency(Oid indexOid);
 extern void index_update_collation_versions(Oid relid, Oid coll);
 
 extern void index_build(Relation heapRelation,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 3d3974f467..4874d33996 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -949,6 +949,10 @@
   proname => 'pg_indexam_has_property', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid text',
   prosrc => 'pg_indexam_has_property' },
+{ oid => '8102', descr => 'test property of an index',
+  proname => 'pg_index_has_outdated_dependency', provolatile => 's',
+  prorettype => 'bool', proargtypes => 'regclass',
+  prosrc => 'pg_index_has_outdated_dependency' },
 { oid => '637', descr => 'test property of an index',
   proname => 'pg_index_has_property', provolatile => 's', prorettype => 'bool',
   proargtypes => 'regclass text', prosrc => 'pg_index_has_property' },
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 28083aaac9..e6c725d9a6 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -295,6 +295,7 @@ PG_KEYWORD("order", ORDER, RESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("ordinality", ORDINALITY, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("others", OTHERS, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("out", OUT_P, COL_NAME_KEYWORD, BARE_LABEL)
+PG_KEYWORD("outdated", OUTDATED, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("outer", OUTER_P, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
 PG_KEYWORD("over", OVER, UNRESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("overlaps", OVERLAPS, TYPE_FUNC_NAME_KEYWORD, AS_LABEL)
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79323..a7a2272abd 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -45,6 +45,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetIndexListFiltered(Relation relation, bits32 options);
 extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 830fdddf24..5f7c49c650 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2018,6 +2018,16 @@ INFO:  index "reindex_verbose_pkey" was reindexed
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 --
+-- REINDEX (OUTDATED)
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (OUTDATED) TABLE reindex_coll;
+NOTICE:  table "reindex_coll" has no indexes to reindex
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+--
 -- REINDEX CONCURRENTLY
 --
 CREATE TABLE concur_reindex_tab (c1 int);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 8bc76f7c6f..808cacee5d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -790,6 +790,16 @@ REINDEX (VERBOSE) TABLE reindex_verbose;
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 
+--
+-- REINDEX (OUTDATED)
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (OUTDATED) TABLE reindex_coll;
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+
 --
 -- REINDEX CONCURRENTLY
 --
-- 
2.30.1

v5-0002-Add-a-outdated-option-to-reindexdb.patchtext/x-diff; charset=us-asciiDownload

From 2f1388251a84badbc888fddfacb7c5357fcec873 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 25 Feb 2021 01:33:58 +0800
Subject: [PATCH v5 2/2] Add a --outdated option to reindexdb

This uses the new OUTDATED option for REINDEX.  If user asks for multiple job,
the list of tables to process will be sorted by the total size of underlying
indexes that have outdated dependency.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by:
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 src/bin/scripts/reindexdb.c        | 143 ++++++++++++++++++++++++-----
 src/bin/scripts/t/090_reindexdb.pl |  34 ++++++-
 2 files changed, 151 insertions(+), 26 deletions(-)

diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index cf28176243..369d164e65 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -35,20 +35,23 @@ typedef enum ReindexType
 static SimpleStringList *get_parallel_object_list(PGconn *conn,
 												  ReindexType type,
 												  SimpleStringList *user_list,
+												  bool outdated,
 												  bool echo);
 static void reindex_one_database(const ConnParams *cparams, ReindexType type,
 								 SimpleStringList *user_list,
 								 const char *progname,
 								 bool echo, bool verbose, bool concurrently,
-								 int concurrentCons, const char *tablespace);
+								 int concurrentCons, const char *tablespace,
+								 bool outdated);
 static void reindex_all_databases(ConnParams *cparams,
 								  const char *progname, bool echo,
 								  bool quiet, bool verbose, bool concurrently,
-								  int concurrentCons, const char *tablespace);
+								  int concurrentCons, const char *tablespace,
+								  bool outdated);
 static void run_reindex_command(PGconn *conn, ReindexType type,
 								const char *name, bool echo, bool verbose,
 								bool concurrently, bool async,
-								const char *tablespace);
+								const char *tablespace, bool outdated);
 
 static void help(const char *progname);
 
@@ -74,6 +77,7 @@ main(int argc, char *argv[])
 		{"concurrently", no_argument, NULL, 1},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"tablespace", required_argument, NULL, 3},
+		{"outdated", no_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -95,6 +99,7 @@ main(int argc, char *argv[])
 	bool		quiet = false;
 	bool		verbose = false;
 	bool		concurrently = false;
+	bool		outdated = false;
 	SimpleStringList indexes = {NULL, NULL};
 	SimpleStringList tables = {NULL, NULL};
 	SimpleStringList schemas = {NULL, NULL};
@@ -170,6 +175,9 @@ main(int argc, char *argv[])
 			case 3:
 				tablespace = pg_strdup(optarg);
 				break;
+			case 4:
+				outdated = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -234,7 +242,8 @@ main(int argc, char *argv[])
 		cparams.dbname = maintenance_db;
 
 		reindex_all_databases(&cparams, progname, echo, quiet, verbose,
-							  concurrently, concurrentCons, tablespace);
+							  concurrently, concurrentCons, tablespace,
+							  outdated);
 	}
 	else if (syscatalog)
 	{
@@ -253,12 +262,17 @@ main(int argc, char *argv[])
 			pg_log_error("cannot reindex specific index(es) and system catalogs at the same time");
 			exit(1);
 		}
-
 		if (concurrentCons > 1)
 		{
 			pg_log_error("cannot use multiple jobs to reindex system catalogs");
 			exit(1);
 		}
+		if (outdated)
+		{
+			pg_log_error("cannot filter indexes having outdated dependencies "
+						 "and reindex system catalogs at the same time");
+			exit(1);
+		}
 
 		if (dbname == NULL)
 		{
@@ -274,7 +288,7 @@ main(int argc, char *argv[])
 
 		reindex_one_database(&cparams, REINDEX_SYSTEM, NULL,
 							 progname, echo, verbose,
-							 concurrently, 1, tablespace);
+							 concurrently, 1, tablespace, outdated);
 	}
 	else
 	{
@@ -304,17 +318,20 @@ main(int argc, char *argv[])
 		if (schemas.head != NULL)
 			reindex_one_database(&cparams, REINDEX_SCHEMA, &schemas,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 
 		if (indexes.head != NULL)
 			reindex_one_database(&cparams, REINDEX_INDEX, &indexes,
 								 progname, echo, verbose,
-								 concurrently, 1, tablespace);
+								 concurrently, 1, tablespace,
+								 outdated);
 
 		if (tables.head != NULL)
 			reindex_one_database(&cparams, REINDEX_TABLE, &tables,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 
 		/*
 		 * reindex database only if neither index nor table nor schema is
@@ -323,7 +340,8 @@ main(int argc, char *argv[])
 		if (indexes.head == NULL && tables.head == NULL && schemas.head == NULL)
 			reindex_one_database(&cparams, REINDEX_DATABASE, NULL,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 	}
 
 	exit(0);
@@ -334,7 +352,7 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
 					 SimpleStringList *user_list,
 					 const char *progname, bool echo,
 					 bool verbose, bool concurrently, int concurrentCons,
-					 const char *tablespace)
+					 const char *tablespace, bool outdated)
 {
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
@@ -363,6 +381,14 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
 		exit(1);
 	}
 
+	if (outdated && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_log_error("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+					 "outdated", "14");
+		exit(1);
+	}
+
 	if (!parallel)
 	{
 		switch (process_type)
@@ -399,14 +425,24 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
 				 */
 				if (concurrently)
 					pg_log_warning("cannot reindex system catalogs concurrently, skipping all");
+				else if (outdated)
+				{
+					/*
+					 * The only supported kind of object that can be outdated
+					 * is collation.  No system catalog has any index that can
+					 * depend on an outdated collation, so skip system
+					 * catalogs.
+					 */
+				}
 				else
 					run_reindex_command(conn, REINDEX_SYSTEM, PQdb(conn), echo,
 										verbose, concurrently, false,
-										tablespace);
+										tablespace, outdated);
 
 				/* Build a list of relations from the database */
 				process_list = get_parallel_object_list(conn, process_type,
-														user_list, echo);
+														user_list, outdated,
+														echo);
 				process_type = REINDEX_TABLE;
 
 				/* Bail out if nothing to process */
@@ -419,7 +455,8 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
 
 				/* Build a list of relations from all the schemas */
 				process_list = get_parallel_object_list(conn, process_type,
-														user_list, echo);
+														user_list, outdated,
+														echo);
 				process_type = REINDEX_TABLE;
 
 				/* Bail out if nothing to process */
@@ -484,7 +521,8 @@ reindex_one_database(const ConnParams *cparams, ReindexType type,
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
 		run_reindex_command(free_slot->connection, process_type, objname,
-							echo, verbose, concurrently, true, tablespace);
+							echo, verbose, concurrently, true, tablespace,
+							outdated);
 
 		cell = cell->next;
 	} while (cell != NULL);
@@ -509,7 +547,7 @@ finish:
 static void
 run_reindex_command(PGconn *conn, ReindexType type, const char *name,
 					bool echo, bool verbose, bool concurrently, bool async,
-					const char *tablespace)
+					const char *tablespace, bool outdated)
 {
 	const char *paren = "(";
 	const char *comma = ", ";
@@ -536,6 +574,12 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
 		sep = comma;
 	}
 
+	if (outdated)
+	{
+		appendPQExpBuffer(&sql, "%sOUTDATED", sep);
+		sep = comma;
+	}
+
 	if (sep != paren)
 		appendPQExpBufferStr(&sql, ") ");
 
@@ -641,7 +685,7 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
  */
 static SimpleStringList *
 get_parallel_object_list(PGconn *conn, ReindexType type,
-						 SimpleStringList *user_list, bool echo)
+						 SimpleStringList *user_list, bool outdated, bool echo)
 {
 	PQExpBufferData catalog_query;
 	PQExpBufferData buf;
@@ -660,16 +704,41 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 	{
 		case REINDEX_DATABASE:
 			Assert(user_list == NULL);
+
 			appendPQExpBufferStr(&catalog_query,
 								 "SELECT c.relname, ns.nspname\n"
 								 " FROM pg_catalog.pg_class c\n"
 								 " JOIN pg_catalog.pg_namespace ns"
-								 " ON c.relnamespace = ns.oid\n"
+								 " ON c.relnamespace = ns.oid\n");
+
+			if (outdated)
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " JOIN pg_catalog.pg_index i"
+									 " ON c.oid = i.indrelid\n"
+									 " JOIN pg_catalog.pg_class ci"
+									 " ON i.indexrelid = ci.oid\n");
+			}
+
+			appendPQExpBufferStr(&catalog_query,
 								 " WHERE ns.nspname != 'pg_catalog'\n"
 								 "   AND c.relkind IN ("
 								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) ")\n"
-								 " ORDER BY c.relpages DESC;");
+								 CppAsString2(RELKIND_MATVIEW) ")\n");
+
+			if (outdated)
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " GROUP BY c.relname, ns.nspname\n"
+									 " ORDER BY sum(ci.relpages)"
+									 " FILTER (WHERE pg_catalog.pg_index_has_outdated_dependency(ci.oid)) DESC;");
+			}
+			else
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " ORDER BY c.relpages DESC;");
+			}
+
 			break;
 
 		case REINDEX_SCHEMA:
@@ -687,7 +756,18 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 									 "SELECT c.relname, ns.nspname\n"
 									 " FROM pg_catalog.pg_class c\n"
 									 " JOIN pg_catalog.pg_namespace ns"
-									 " ON c.relnamespace = ns.oid\n"
+									 " ON c.relnamespace = ns.oid\n");
+
+				if (outdated)
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " JOIN pg_catalog.pg_index i"
+										 " ON c.oid = i.indrelid\n"
+										 " JOIN pg_catalog.pg_class ci"
+										 " ON i.indexrelid = ci.oid\n");
+				}
+
+				appendPQExpBufferStr(&catalog_query,
 									 " WHERE c.relkind IN ("
 									 CppAsString2(RELKIND_RELATION) ", "
 									 CppAsString2(RELKIND_MATVIEW) ")\n"
@@ -705,8 +785,20 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 					appendStringLiteralConn(&catalog_query, nspname, conn);
 				}
 
-				appendPQExpBufferStr(&catalog_query, ")\n"
-									 " ORDER BY c.relpages DESC;");
+				appendPQExpBufferStr(&catalog_query, ")\n");
+
+				if (outdated)
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " GROUP BY c.relname, ns.nspname\n"
+										 " ORDER BY sum(ci.relpages)"
+										 " FILTER (WHERE pg_catalog.pg_index_has_outdated_dependency(ci.oid)) DESC;");
+				}
+				else
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " ORDER BY c.relpages DESC;");
+				}
 			}
 			break;
 
@@ -754,7 +846,7 @@ static void
 reindex_all_databases(ConnParams *cparams,
 					  const char *progname, bool echo, bool quiet, bool verbose,
 					  bool concurrently, int concurrentCons,
-					  const char *tablespace)
+					  const char *tablespace, bool outdated)
 {
 	PGconn	   *conn;
 	PGresult   *result;
@@ -778,7 +870,7 @@ reindex_all_databases(ConnParams *cparams,
 
 		reindex_one_database(cparams, REINDEX_DATABASE, NULL,
 							 progname, echo, verbose, concurrently,
-							 concurrentCons, tablespace);
+							 concurrentCons, tablespace, outdated);
 	}
 
 	PQclear(result);
@@ -797,6 +889,7 @@ help(const char *progname)
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -i, --index=INDEX            recreate specific index(es) only\n"));
 	printf(_("  -j, --jobs=NUM               use this many concurrent connections to reindex\n"));
+	printf(_("      --outdated               only process indexes having outdated depencies\n"));
 	printf(_("  -q, --quiet                  don't write any messages\n"));
 	printf(_("  -s, --system                 reindex system catalogs\n"));
 	printf(_("  -S, --schema=SCHEMA          reindex specific schema(s) only\n"));
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index 159b637230..b60cac6081 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 58;
+use Test::More tests => 70;
 
 program_help_ok('reindexdb');
 program_version_ok('reindexdb');
@@ -174,6 +174,9 @@ $node->command_fails(
 $node->command_fails(
 	[ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ],
 	'parallel reindexdb cannot process indexes');
+$node->command_fails(
+	[ 'reindexdb', '-s', '--outdated' ],
+	'cannot reindex system catalog and filter indexes having outdated dependencies');
 $node->issues_sql_like(
 	[ 'reindexdb', '-j', '2', 'postgres' ],
 	qr/statement:\ REINDEX SYSTEM postgres;
@@ -196,3 +199,32 @@ $node->command_checks_all(
 		qr/^reindexdb: warning: cannot reindex system catalogs concurrently, skipping all/s
 	],
 	'parallel reindexdb for system with --concurrently skips catalogs');
+
+# Temporarily downgrade client-min-message to get the no-op report
+$ENV{PGOPTIONS} = '--client-min-messages=NOTICE';
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-t', 's1.t1', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/table "t1" has no indexes to reindex/],
+	'verbose reindexdb for outdated dependencies on a specific table reports no-op tables');
+
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-d', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/^$/],
+	'verbose reindexdb for outdated dependencies database wide silently ignore all tables');
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-j', '2', '-d', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/table "t1" has no indexes to reindex/],
+	'parallel verbose reindexdb for outdated dependencies database wide reports no-op tables');
+
+# Switch back to WARNING client-min-message
+$ENV{PGOPTIONS} = '--client-min-messages=WARNING';
+$node->issues_sql_like(
+	[ 'reindexdb', '--outdated', '-t', 's1.t1', 'postgres' ],
+	qr/.*statement: REINDEX \(OUTDATED\) TABLE s1\.t1;/,
+	'reindexdb for outdated dependencies specify the OUTDATED keyword');
-- 
2.30.1

#22

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#21)

2 attachment(s)

Re: REINDEX backend filtering

On Wed, Mar 03, 2021 at 01:56:59PM +0800, Julien Rouhaud wrote:

Please find attached v5 which address all previous comments:

- consistently use "outdated"
- use REINDEX (OUTDATED) grammar (with a new unreserved OUTDATED keyword)
- new --outdated option to reindexdb
- expose a new "pg_index_has_outdated_dependency(regclass)" SQL function
- use that function in reindexdb --outdated to sort tables by total
indexes-to-be-processed size

v6 attached, rebase only due to conflict with recent commit.

Attachments:

v6-0001-Add-a-new-OUTDATED-filtering-facility-for-REINDEX.patchtext/x-diff; charset=us-asciiDownload

From d2e05e6f64c88b0d5074b9963586bf6999276762 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v6 1/2] Add a new OUTDATED filtering facility for REINDEX
 command.

OUTDATED is added a new unreserved keyword.

When used, REINDEX will only process indexes that have an outdated dependency.
For now, only dependency on collations are supported but we'll likely support
other kind of dependency in the future.

Also add a new pg_index_has_outdated_dependency(regclass) SQL function, so
client code can filter such indexes if needed.  This function will also be used
in a following commit to teach reindexdb to use this new OUTDATED option and
order the tables by the amount of work that will actually be done.

Catversion (should be) bumped.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by:
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 doc/src/sgml/func.sgml                     |  27 ++++--
 doc/src/sgml/ref/reindex.sgml              |  12 +++
 src/backend/catalog/index.c                | 107 ++++++++++++++++++++-
 src/backend/commands/indexcmds.c           |  12 ++-
 src/backend/parser/gram.y                  |   4 +-
 src/backend/utils/cache/relcache.c         |  40 ++++++++
 src/bin/psql/tab-complete.c                |   2 +-
 src/include/catalog/index.h                |   3 +
 src/include/catalog/pg_proc.dat            |   4 +
 src/include/parser/kwlist.h                |   1 +
 src/include/utils/relcache.h               |   1 +
 src/test/regress/expected/create_index.out |  10 ++
 src/test/regress/sql/create_index.sql      |  10 ++
 13 files changed, 221 insertions(+), 12 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..0eda6678ac 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26448,12 +26448,13 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
 
    <para>
     <xref linkend="functions-admin-index-table"/> shows the functions
-    available for index maintenance tasks.  (Note that these maintenance
-    tasks are normally done automatically by autovacuum; use of these
-    functions is only required in special cases.)
-    These functions cannot be executed during recovery.
-    Use of these functions is restricted to superusers and the owner
-    of the given index.
+    available for index maintenance tasks.  (Note that the maintenance
+    tasks performing actions on indexes are normally done automatically by
+    autovacuum; use of these functions is only required in special cases.)
+    The functions performing actions on indexes cannot be executed during
+    recovery.
+    Use of the functions performing actions on indexes is restricted to
+    superusers and the owner of the given index.
    </para>
 
    <table id="functions-admin-index-table">
@@ -26538,6 +26539,20 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
         option.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_index_has_outdated_dependency</primary>
+        </indexterm>
+        <function>pg_index_has_outdated_dependency</function> ( <parameter>index</parameter> <type>regclass</type> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Check if the specified index has any outdated dependency.  For now only
+        dependency on collations are supported.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index ff4dba8c36..aa66e6461f 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -26,6 +26,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
     CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
+    OUTDATED [ <replaceable class="parameter">boolean</replaceable> ]
     TABLESPACE <replaceable class="parameter">new_tablespace</replaceable>
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
 </synopsis>
@@ -188,6 +189,17 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>OUTDATED</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild and only
+      process indexes that have outdated dependencies.  Fow now, the only
+      handle dependency is for the collation provider version.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>TABLESPACE</literal></term>
     <listitem>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4ef61b5efd..571feac5db 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -100,6 +100,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool outdated;	/* depends on at least on deprected collation? */
+} IndexHasOutdatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1351,6 +1357,105 @@ index_check_collation_versions(Oid relid)
 	list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one outdated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_outdated_collation(const ObjectAddress *otherObject,
+										const char *version,
+										char **new_version,
+										void *data)
+{
+	IndexHasOutdatedColl *context = data;
+	char *current_version;
+
+	/* We only care about dependencies on collations. */
+	if (otherObject->classId != CollationRelationId)
+		return false;
+
+	/* Fast exit if we already found a outdated collation version. */
+	if (context->outdated)
+		return false;
+
+	/* Ask the provider for the current version.  Give up if unsupported. */
+	current_version = get_collation_version_for_oid(otherObject->objectId,
+													false);
+	if (!current_version)
+		return false;
+
+	if (!version || strcmp(version, current_version) != 0)
+		context->outdated = true;
+
+	return false;
+}
+
+Datum
+pg_index_has_outdated_dependency(PG_FUNCTION_ARGS)
+{
+	Oid			indexOid = PG_GETARG_OID(0);
+	Relation	rel;
+	bool		isIndex;
+	bool		res;
+
+	rel = try_relation_open(indexOid, AccessShareLock);
+
+	if (rel == NULL)
+		PG_RETURN_NULL();
+
+	isIndex = rel->rd_rel->relkind == RELKIND_INDEX;
+
+	if (!isIndex)
+	{
+		relation_close(rel, AccessShareLock);
+		PG_RETURN_NULL();
+	}
+
+	res = index_has_outdated_dependency(indexOid);
+
+	relation_close(rel, AccessShareLock);
+
+	PG_RETURN_BOOL(res);
+}
+
+/*
+ * Check whether the given index has a dependency with an outdated
+ * collation version.
+ * Caller must hold a suitable lock and make sure that the given Oid belongs to
+ * an index.
+ */
+bool
+index_has_outdated_collation(Oid indexOid)
+{
+	ObjectAddress object;
+	IndexHasOutdatedColl context;
+
+	object.classId = RelationRelationId;
+	object.objectId = indexOid;
+	object.objectSubId = 0;
+
+	context.relid = indexOid;
+	context.outdated = false;
+
+	visitDependenciesOf(&object, &do_check_index_has_outdated_collation,
+						&context);
+
+	return context.outdated;
+}
+
+/*
+ * Check whether the given index has a dependency with an outdated
+ * refobjversion.
+ * Caller must hold a suitable lock and make sure that the given Oid belongs to
+ * an index.
+ * For now, only dependency on collations are supported.
+ */
+bool
+index_has_outdated_dependency(Oid indexOid)
+{
+	return index_has_outdated_collation(indexOid);
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3991,7 +4096,7 @@ reindex_relation(Oid relid, int flags, ReindexParams *params)
 	 * relcache to get this with a sequential scan if ignoring system
 	 * indexes.)
 	 */
-	indexIds = RelationGetIndexList(rel);
+	indexIds = RelationGetIndexListFiltered(rel, params->options);
 
 	if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8bc652ecd3..0e1bcad101 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2484,6 +2484,7 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	bool		concurrently = false;
 	bool		verbose = false;
 	char	   *tablespacename = NULL;
+	bool		outdated_filter = false;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -2496,6 +2497,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 			concurrently = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespacename = defGetString(opt);
+		else if (strcmp(opt->defname, "outdated") == 0)
+			outdated_filter = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2510,7 +2513,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 
 	params.options =
 		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+		(concurrently ? REINDEXOPT_CONCURRENTLY : 0) |
+		(outdated_filter ? REINDEXOPT_OUTDATED : 0);
 
 	/*
 	 * Assign the tablespace OID to move indexes to, with InvalidOid to do
@@ -3307,7 +3311,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 									RelationGetRelationName(heapRelation))));
 
 				/* Add all the valid indexes of relation to list */
-				foreach(lc, RelationGetIndexList(heapRelation))
+				foreach(lc, RelationGetIndexListFiltered(heapRelation,
+														 params->options))
 				{
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
@@ -3359,7 +3364,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 
 					MemoryContextSwitchTo(oldcontext);
 
-					foreach(lc2, RelationGetIndexList(toastRelation))
+					foreach(lc2, RelationGetIndexListFiltered(toastRelation,
+															  params->options))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 652be0b96d..a67cfa867c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -674,7 +674,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	NULLS_P NUMERIC
 
 	OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR
-	ORDER ORDINALITY OTHERS OUT_P OUTER_P
+	ORDER ORDINALITY OTHERS OUT_P OUTDATED OUTER_P
 	OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER
 
 	PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY
@@ -15430,6 +15430,7 @@ unreserved_keyword:
 			| OPTIONS
 			| ORDINALITY
 			| OTHERS
+			| OUTDATED
 			| OVER
 			| OVERRIDING
 			| OWNED
@@ -16002,6 +16003,7 @@ bare_label_keyword:
 			| ORDINALITY
 			| OTHERS
 			| OUT_P
+			| OUTDATED
 			| OUTER_P
 			| OVERLAY
 			| OVERRIDING
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..06545ee550 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4620,6 +4620,46 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+/*
+ * RelationGetIndexListFiltered -- get a filtered list of indexes on this
+ * relation.
+ *
+ * Calls RelationGetIndexList and only keep indexes that have an outdated
+ * dependency.  For now, only collation version dependency is supported.
+ */
+List *
+RelationGetIndexListFiltered(Relation relation, bits32 options)
+{
+	List	   *result,
+			   *full_list;
+	ListCell   *lc;
+
+	full_list = RelationGetIndexList(relation);
+
+	/* Fast exit if no filtering was asked, or if the list if empty. */
+	if (((options & REINDEXOPT_OUTDATED) == 0) || full_list == NIL)
+		return full_list;
+
+	result = NIL;
+	foreach(lc, full_list)
+	{
+		Oid		indexOid = lfirst_oid(lc);
+
+		/*
+		 * Check for outdated collation version dependency.
+		 */
+		if (index_has_outdated_collation(indexOid))
+		{
+			result = lappend_oid(result, indexOid);
+			continue;
+		}
+
+		/* Didn't find any outdated dependency, index will be ignored. */
+	}
+
+	return result;
+}
+
 /*
  * RelationGetStatExtList
  *		get a list of OIDs of statistics objects on this relation
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ecdb8d752b..0843f36580 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3674,7 +3674,7 @@ psql_completion(const char *text, int start, int end)
 		 * one word, so the above test is correct.
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-			COMPLETE_WITH("CONCURRENTLY", "TABLESPACE", "VERBOSE");
+			COMPLETE_WITH("CONCURRENTLY", "OUTDATED'", "TABLESPACE", "VERBOSE");
 		else if (TailMatches("TABLESPACE"))
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e22d506436..298c0c633c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -42,6 +42,7 @@ typedef struct ReindexParams
 #define REINDEXOPT_REPORT_PROGRESS 0x02 /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK 	0x04	/* skip missing relations */
 #define REINDEXOPT_CONCURRENTLY	0x08	/* concurrent mode */
+#define REINDEXOPT_OUTDATED		0x10/* outdated collation only */
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
@@ -137,6 +138,8 @@ extern void FormIndexDatum(IndexInfo *indexInfo,
 						   bool *isnull);
 
 extern void index_check_collation_versions(Oid relid);
+extern bool index_has_outdated_collation(Oid indexOid);
+extern bool index_has_outdated_dependency(Oid indexOid);
 extern void index_update_collation_versions(Oid relid, Oid coll);
 
 extern void index_build(Relation heapRelation,
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 93393fcfd4..911c12ee2c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -949,6 +949,10 @@
   proname => 'pg_indexam_has_property', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid text',
   prosrc => 'pg_indexam_has_property' },
+{ oid => '8102', descr => 'test property of an index',
+  proname => 'pg_index_has_outdated_dependency', provolatile => 's',
+  prorettype => 'bool', proargtypes => 'regclass',
+  prosrc => 'pg_index_has_outdated_dependency' },
 { oid => '637', descr => 'test property of an index',
   proname => 'pg_index_has_property', provolatile => 's', prorettype => 'bool',
   proargtypes => 'regclass text', prosrc => 'pg_index_has_property' },
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 28083aaac9..e6c725d9a6 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -295,6 +295,7 @@ PG_KEYWORD("order", ORDER, RESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("ordinality", ORDINALITY, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("others", OTHERS, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("out", OUT_P, COL_NAME_KEYWORD, BARE_LABEL)
+PG_KEYWORD("outdated", OUTDATED, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("outer", OUTER_P, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
 PG_KEYWORD("over", OVER, UNRESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("overlaps", OVERLAPS, TYPE_FUNC_NAME_KEYWORD, AS_LABEL)
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79323..a7a2272abd 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -45,6 +45,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetIndexListFiltered(Relation relation, bits32 options);
 extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 830fdddf24..5f7c49c650 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2018,6 +2018,16 @@ INFO:  index "reindex_verbose_pkey" was reindexed
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 --
+-- REINDEX (OUTDATED)
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (OUTDATED) TABLE reindex_coll;
+NOTICE:  table "reindex_coll" has no indexes to reindex
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+--
 -- REINDEX CONCURRENTLY
 --
 CREATE TABLE concur_reindex_tab (c1 int);
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 8bc76f7c6f..808cacee5d 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -790,6 +790,16 @@ REINDEX (VERBOSE) TABLE reindex_verbose;
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 
+--
+-- REINDEX (OUTDATED)
+--
+CREATE TABLE reindex_coll(id integer primary key);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (OUTDATED) TABLE reindex_coll;
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+
 --
 -- REINDEX CONCURRENTLY
 --
-- 
2.30.1

v6-0002-Add-a-outdated-option-to-reindexdb.patchtext/x-diff; charset=us-asciiDownload

From 6fe675dbe236d6fae1fbd2ec0b1f9b83607014e9 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 25 Feb 2021 01:33:58 +0800
Subject: [PATCH v6 2/2] Add a --outdated option to reindexdb

This uses the new OUTDATED option for REINDEX.  If user asks for multiple job,
the list of tables to process will be sorted by the total size of underlying
indexes that have outdated dependency.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by:
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 src/bin/scripts/reindexdb.c        | 143 ++++++++++++++++++++++++-----
 src/bin/scripts/t/090_reindexdb.pl |  34 ++++++-
 2 files changed, 151 insertions(+), 26 deletions(-)

diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index fc0681538a..addea5c1ed 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -35,20 +35,23 @@ typedef enum ReindexType
 static SimpleStringList *get_parallel_object_list(PGconn *conn,
 												  ReindexType type,
 												  SimpleStringList *user_list,
+												  bool outdated,
 												  bool echo);
 static void reindex_one_database(ConnParams *cparams, ReindexType type,
 								 SimpleStringList *user_list,
 								 const char *progname,
 								 bool echo, bool verbose, bool concurrently,
-								 int concurrentCons, const char *tablespace);
+								 int concurrentCons, const char *tablespace,
+								 bool outdated);
 static void reindex_all_databases(ConnParams *cparams,
 								  const char *progname, bool echo,
 								  bool quiet, bool verbose, bool concurrently,
-								  int concurrentCons, const char *tablespace);
+								  int concurrentCons, const char *tablespace,
+								  bool outdated);
 static void run_reindex_command(PGconn *conn, ReindexType type,
 								const char *name, bool echo, bool verbose,
 								bool concurrently, bool async,
-								const char *tablespace);
+								const char *tablespace, bool outdated);
 
 static void help(const char *progname);
 
@@ -74,6 +77,7 @@ main(int argc, char *argv[])
 		{"concurrently", no_argument, NULL, 1},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"tablespace", required_argument, NULL, 3},
+		{"outdated", no_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -95,6 +99,7 @@ main(int argc, char *argv[])
 	bool		quiet = false;
 	bool		verbose = false;
 	bool		concurrently = false;
+	bool		outdated = false;
 	SimpleStringList indexes = {NULL, NULL};
 	SimpleStringList tables = {NULL, NULL};
 	SimpleStringList schemas = {NULL, NULL};
@@ -170,6 +175,9 @@ main(int argc, char *argv[])
 			case 3:
 				tablespace = pg_strdup(optarg);
 				break;
+			case 4:
+				outdated = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -234,7 +242,8 @@ main(int argc, char *argv[])
 		cparams.dbname = maintenance_db;
 
 		reindex_all_databases(&cparams, progname, echo, quiet, verbose,
-							  concurrently, concurrentCons, tablespace);
+							  concurrently, concurrentCons, tablespace,
+							  outdated);
 	}
 	else if (syscatalog)
 	{
@@ -253,12 +262,17 @@ main(int argc, char *argv[])
 			pg_log_error("cannot reindex specific index(es) and system catalogs at the same time");
 			exit(1);
 		}
-
 		if (concurrentCons > 1)
 		{
 			pg_log_error("cannot use multiple jobs to reindex system catalogs");
 			exit(1);
 		}
+		if (outdated)
+		{
+			pg_log_error("cannot filter indexes having outdated dependencies "
+						 "and reindex system catalogs at the same time");
+			exit(1);
+		}
 
 		if (dbname == NULL)
 		{
@@ -274,7 +288,7 @@ main(int argc, char *argv[])
 
 		reindex_one_database(&cparams, REINDEX_SYSTEM, NULL,
 							 progname, echo, verbose,
-							 concurrently, 1, tablespace);
+							 concurrently, 1, tablespace, outdated);
 	}
 	else
 	{
@@ -304,17 +318,20 @@ main(int argc, char *argv[])
 		if (schemas.head != NULL)
 			reindex_one_database(&cparams, REINDEX_SCHEMA, &schemas,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 
 		if (indexes.head != NULL)
 			reindex_one_database(&cparams, REINDEX_INDEX, &indexes,
 								 progname, echo, verbose,
-								 concurrently, 1, tablespace);
+								 concurrently, 1, tablespace,
+								 outdated);
 
 		if (tables.head != NULL)
 			reindex_one_database(&cparams, REINDEX_TABLE, &tables,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 
 		/*
 		 * reindex database only if neither index nor table nor schema is
@@ -323,7 +340,8 @@ main(int argc, char *argv[])
 		if (indexes.head == NULL && tables.head == NULL && schemas.head == NULL)
 			reindex_one_database(&cparams, REINDEX_DATABASE, NULL,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 	}
 
 	exit(0);
@@ -334,7 +352,7 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 					 SimpleStringList *user_list,
 					 const char *progname, bool echo,
 					 bool verbose, bool concurrently, int concurrentCons,
-					 const char *tablespace)
+					 const char *tablespace, bool outdated)
 {
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
@@ -363,6 +381,14 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 		exit(1);
 	}
 
+	if (outdated && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_log_error("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+					 "outdated", "14");
+		exit(1);
+	}
+
 	if (!parallel)
 	{
 		switch (process_type)
@@ -399,14 +425,24 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 				 */
 				if (concurrently)
 					pg_log_warning("cannot reindex system catalogs concurrently, skipping all");
+				else if (outdated)
+				{
+					/*
+					 * The only supported kind of object that can be outdated
+					 * is collation.  No system catalog has any index that can
+					 * depend on an outdated collation, so skip system
+					 * catalogs.
+					 */
+				}
 				else
 					run_reindex_command(conn, REINDEX_SYSTEM, PQdb(conn), echo,
 										verbose, concurrently, false,
-										tablespace);
+										tablespace, outdated);
 
 				/* Build a list of relations from the database */
 				process_list = get_parallel_object_list(conn, process_type,
-														user_list, echo);
+														user_list, outdated,
+														echo);
 				process_type = REINDEX_TABLE;
 
 				/* Bail out if nothing to process */
@@ -419,7 +455,8 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 
 				/* Build a list of relations from all the schemas */
 				process_list = get_parallel_object_list(conn, process_type,
-														user_list, echo);
+														user_list, outdated,
+														echo);
 				process_type = REINDEX_TABLE;
 
 				/* Bail out if nothing to process */
@@ -485,7 +522,8 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
 		run_reindex_command(free_slot->connection, process_type, objname,
-							echo, verbose, concurrently, true, tablespace);
+							echo, verbose, concurrently, true, tablespace,
+							outdated);
 
 		cell = cell->next;
 	} while (cell != NULL);
@@ -510,7 +548,7 @@ finish:
 static void
 run_reindex_command(PGconn *conn, ReindexType type, const char *name,
 					bool echo, bool verbose, bool concurrently, bool async,
-					const char *tablespace)
+					const char *tablespace, bool outdated)
 {
 	const char *paren = "(";
 	const char *comma = ", ";
@@ -537,6 +575,12 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
 		sep = comma;
 	}
 
+	if (outdated)
+	{
+		appendPQExpBuffer(&sql, "%sOUTDATED", sep);
+		sep = comma;
+	}
+
 	if (sep != paren)
 		appendPQExpBufferStr(&sql, ") ");
 
@@ -642,7 +686,7 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
  */
 static SimpleStringList *
 get_parallel_object_list(PGconn *conn, ReindexType type,
-						 SimpleStringList *user_list, bool echo)
+						 SimpleStringList *user_list, bool outdated, bool echo)
 {
 	PQExpBufferData catalog_query;
 	PQExpBufferData buf;
@@ -661,16 +705,41 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 	{
 		case REINDEX_DATABASE:
 			Assert(user_list == NULL);
+
 			appendPQExpBufferStr(&catalog_query,
 								 "SELECT c.relname, ns.nspname\n"
 								 " FROM pg_catalog.pg_class c\n"
 								 " JOIN pg_catalog.pg_namespace ns"
-								 " ON c.relnamespace = ns.oid\n"
+								 " ON c.relnamespace = ns.oid\n");
+
+			if (outdated)
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " JOIN pg_catalog.pg_index i"
+									 " ON c.oid = i.indrelid\n"
+									 " JOIN pg_catalog.pg_class ci"
+									 " ON i.indexrelid = ci.oid\n");
+			}
+
+			appendPQExpBufferStr(&catalog_query,
 								 " WHERE ns.nspname != 'pg_catalog'\n"
 								 "   AND c.relkind IN ("
 								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) ")\n"
-								 " ORDER BY c.relpages DESC;");
+								 CppAsString2(RELKIND_MATVIEW) ")\n");
+
+			if (outdated)
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " GROUP BY c.relname, ns.nspname\n"
+									 " ORDER BY sum(ci.relpages)"
+									 " FILTER (WHERE pg_catalog.pg_index_has_outdated_dependency(ci.oid)) DESC;");
+			}
+			else
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " ORDER BY c.relpages DESC;");
+			}
+
 			break;
 
 		case REINDEX_SCHEMA:
@@ -688,7 +757,18 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 									 "SELECT c.relname, ns.nspname\n"
 									 " FROM pg_catalog.pg_class c\n"
 									 " JOIN pg_catalog.pg_namespace ns"
-									 " ON c.relnamespace = ns.oid\n"
+									 " ON c.relnamespace = ns.oid\n");
+
+				if (outdated)
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " JOIN pg_catalog.pg_index i"
+										 " ON c.oid = i.indrelid\n"
+										 " JOIN pg_catalog.pg_class ci"
+										 " ON i.indexrelid = ci.oid\n");
+				}
+
+				appendPQExpBufferStr(&catalog_query,
 									 " WHERE c.relkind IN ("
 									 CppAsString2(RELKIND_RELATION) ", "
 									 CppAsString2(RELKIND_MATVIEW) ")\n"
@@ -706,8 +786,20 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 					appendStringLiteralConn(&catalog_query, nspname, conn);
 				}
 
-				appendPQExpBufferStr(&catalog_query, ")\n"
-									 " ORDER BY c.relpages DESC;");
+				appendPQExpBufferStr(&catalog_query, ")\n");
+
+				if (outdated)
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " GROUP BY c.relname, ns.nspname\n"
+										 " ORDER BY sum(ci.relpages)"
+										 " FILTER (WHERE pg_catalog.pg_index_has_outdated_dependency(ci.oid)) DESC;");
+				}
+				else
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " ORDER BY c.relpages DESC;");
+				}
 			}
 			break;
 
@@ -755,7 +847,7 @@ static void
 reindex_all_databases(ConnParams *cparams,
 					  const char *progname, bool echo, bool quiet, bool verbose,
 					  bool concurrently, int concurrentCons,
-					  const char *tablespace)
+					  const char *tablespace, bool outdated)
 {
 	PGconn	   *conn;
 	PGresult   *result;
@@ -779,7 +871,7 @@ reindex_all_databases(ConnParams *cparams,
 
 		reindex_one_database(cparams, REINDEX_DATABASE, NULL,
 							 progname, echo, verbose, concurrently,
-							 concurrentCons, tablespace);
+							 concurrentCons, tablespace, outdated);
 	}
 
 	PQclear(result);
@@ -798,6 +890,7 @@ help(const char *progname)
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -i, --index=INDEX            recreate specific index(es) only\n"));
 	printf(_("  -j, --jobs=NUM               use this many concurrent connections to reindex\n"));
+	printf(_("      --outdated               only process indexes having outdated depencies\n"));
 	printf(_("  -q, --quiet                  don't write any messages\n"));
 	printf(_("  -s, --system                 reindex system catalogs\n"));
 	printf(_("  -S, --schema=SCHEMA          reindex specific schema(s) only\n"));
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index 159b637230..b60cac6081 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 58;
+use Test::More tests => 70;
 
 program_help_ok('reindexdb');
 program_version_ok('reindexdb');
@@ -174,6 +174,9 @@ $node->command_fails(
 $node->command_fails(
 	[ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ],
 	'parallel reindexdb cannot process indexes');
+$node->command_fails(
+	[ 'reindexdb', '-s', '--outdated' ],
+	'cannot reindex system catalog and filter indexes having outdated dependencies');
 $node->issues_sql_like(
 	[ 'reindexdb', '-j', '2', 'postgres' ],
 	qr/statement:\ REINDEX SYSTEM postgres;
@@ -196,3 +199,32 @@ $node->command_checks_all(
 		qr/^reindexdb: warning: cannot reindex system catalogs concurrently, skipping all/s
 	],
 	'parallel reindexdb for system with --concurrently skips catalogs');
+
+# Temporarily downgrade client-min-message to get the no-op report
+$ENV{PGOPTIONS} = '--client-min-messages=NOTICE';
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-t', 's1.t1', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/table "t1" has no indexes to reindex/],
+	'verbose reindexdb for outdated dependencies on a specific table reports no-op tables');
+
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-d', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/^$/],
+	'verbose reindexdb for outdated dependencies database wide silently ignore all tables');
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-j', '2', '-d', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/table "t1" has no indexes to reindex/],
+	'parallel verbose reindexdb for outdated dependencies database wide reports no-op tables');
+
+# Switch back to WARNING client-min-message
+$ENV{PGOPTIONS} = '--client-min-messages=WARNING';
+$node->issues_sql_like(
+	[ 'reindexdb', '--outdated', '-t', 's1.t1', 'postgres' ],
+	qr/.*statement: REINDEX \(OUTDATED\) TABLE s1\.t1;/,
+	'reindexdb for outdated dependencies specify the OUTDATED keyword');
-- 
2.30.1

#23

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Julien Rouhaud (#22)

Re: REINDEX backend filtering

On Sun, Mar 14, 2021 at 04:10:07PM +0800, Julien Rouhaud wrote:

v6 attached, rebase only due to conflict with recent commit.

I have read through the patch.

+ bool outdated_filter = false;
Wouldn't it be better to rename that "outdated" instead for
consistency with the other options?

In ReindexRelationConcurrently(), there is no filtering done for the
index themselves. The operation is only done on the list of indexes
fetched from the parent relation. Why? This means that a REINDEX
(OUTDATED) INDEX would actually rebuild an index even if this index
has no out-of-date collations, like a catalog. I think that's
confusing.

Same comment for the non-concurrent case, as of the code paths of
reindex_index().

Would it be better to inform the user which indexes are getting
skipped in the verbose output if REINDEXOPT_VERBOSE is set?

+       <para>
+        Check if the specified index has any outdated dependency. For now only
+        dependency on collations are supported.
+       </para></entry>
[...]
+    <term><literal>OUTDATED</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild and only
+      process indexes that have outdated dependencies.  Fow now, the only
+      handle dependency is for the collation provider version.
+     </para>
Do we really need to be this specific in this part of the
documentation with collations?  The last sentence of this paragraph
sounds weird.  Don't you mean instead to write "the only dependency
handled currently is the collation provider version"?

+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (OUTDATED) TABLE reindex_coll;
What are those details?  And wouldn't it be more stable to just check
after the relfilenode of the indexes instead?

" ORDER BY sum(ci.relpages)"
Schema qualification here, twice.

+ printf(_(" --outdated only process indexes
having outdated depencies\n"));
s/depencies/dependencies/.

+   rel = try_relation_open(indexOid, AccessShareLock);
+
+   if (rel == NULL)
+       PG_RETURN_NULL();
Let's introduce a try_index_open() here.

What's the point in having both index_has_outdated_collation() and
index_has_outdated_collation()?

It seems to me that 0001 should be split into two patches:
- One for the backend OUTDATED option.
- One for pg_index_has_outdated_dependency(), which only makes sense
in-core once reindexdb is introduced.
--
Michael

#24

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#23)

Re: REINDEX backend filtering

On Sun, Mar 14, 2021 at 08:54:11PM +0900, Michael Paquier wrote:

On Sun, Mar 14, 2021 at 04:10:07PM +0800, Julien Rouhaud wrote:

+ bool outdated_filter = false;
Wouldn't it be better to rename that "outdated" instead for
consistency with the other options?

I agree.

In ReindexRelationConcurrently(), there is no filtering done for the
index themselves. The operation is only done on the list of indexes
fetched from the parent relation. Why? This means that a REINDEX
(OUTDATED) INDEX would actually rebuild an index even if this index
has no out-of-date collations, like a catalog. I think that's
confusing.

Same comment for the non-concurrent case, as of the code paths of
reindex_index().

Yes, I'm not sure what we should do in that case. I thought I put a comment
about that but it apparently disappeared during some rewrite.

Is there really a use case for reindexing a specific index and at the same time
asking for possibly ignoring it? I think we should just forbid REINDEX
(OUTDATED) INDEX. What do you think?

Would it be better to inform the user which indexes are getting
skipped in the verbose output if REINDEXOPT_VERBOSE is set?

I was thinking that users would be more interested in the list of indexes being
processed rather than the full list of indexes and a mention of whether it was
processed or not. I can change that if you prefer.

+       <para>
+        Check if the specified index has any outdated dependency. For now only
+        dependency on collations are supported.
+       </para></entry>
[...]
+    <term><literal>OUTDATED</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild and only
+      process indexes that have outdated dependencies.  Fow now, the only
+      handle dependency is for the collation provider version.
+     </para>
Do we really need to be this specific in this part of the
documentation with collations?

I think it's important to document what this option really does, and I don't
see a better place to document it.

The last sentence of this paragraph
sounds weird. Don't you mean instead to write "the only dependency
handled currently is the collation provider version"?

Indeed, I'll fix!

+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (OUTDATED) TABLE reindex_coll;
What are those details?

That just the same comment as the previous occurence in the file, I kept it for
consistency.

And wouldn't it be more stable to just check
after the relfilenode of the indexes instead?

Agreed, I'll add additional tests for that.

" ORDER BY sum(ci.relpages)"
Schema qualification here, twice.

Well, this isn't actually mandatory, per comment at the top:

/*
* The queries here are using a safe search_path, so there's no need to
* fully qualify everything.
*/

But I think it's a better style to fully qualify objects, so I'll fix.

+   rel = try_relation_open(indexOid, AccessShareLock);
+
+   if (rel == NULL)
+       PG_RETURN_NULL();
Let's introduce a try_index_open() here.

Good idea!

What's the point in having both index_has_outdated_collation() and
index_has_outdated_collation()?

Did you mean index_has_outdated_collation() and
index_has_outadted_dependency()? It's just to keep things separated, mostly
for future improvements on that infrastructure. I can get rid of that function
and put back the code in index_has_outadted_dependency() if that's overkill.

It seems to me that 0001 should be split into two patches:
- One for the backend OUTDATED option.
- One for pg_index_has_outdated_dependency(), which only makes sense
in-core once reindexdb is introduced.

I thought it would be better to add the backend part in a single commit, and
then built the client part on top of that in a different commit. I can
rearrange things if you want, but in that case should
index_has_outadted_dependency() be in a different patch as you mention or
simply merged with 0002 (the --oudated option for reindexdb)?

#25

Michael Paquier

michael@paquier.xyz

almost 5 years ago

In reply to: Julien Rouhaud (#24)

Re: REINDEX backend filtering

On Sun, Mar 14, 2021 at 10:57:37PM +0800, Julien Rouhaud wrote:

On Sun, Mar 14, 2021 at 08:54:11PM +0900, Michael Paquier wrote:

In ReindexRelationConcurrently(), there is no filtering done for the
index themselves. The operation is only done on the list of indexes
fetched from the parent relation. Why? This means that a REINDEX
(OUTDATED) INDEX would actually rebuild an index even if this index
has no out-of-date collations, like a catalog. I think that's
confusing.

Same comment for the non-concurrent case, as of the code paths of
reindex_index().

Yes, I'm not sure what we should do in that case. I thought I put a comment
about that but it apparently disappeared during some rewrite.

Is there really a use case for reindexing a specific index and at the same time
asking for possibly ignoring it? I think we should just forbid REINDEX
(OUTDATED) INDEX. What do you think?

I think that there would be cases to be able to handle that, say if a
user wants to works on a specific set of indexes one-by-one. There is
also the argument of inconsistency with the other commands.

I was thinking that users would be more interested in the list of indexes being
processed rather than the full list of indexes and a mention of whether it was
processed or not. I can change that if you prefer.

How invasive do you think it would be to add a note in the verbose
output when indexes are skipped?

Did you mean index_has_outdated_collation() and
index_has_outdated_dependency()? It's just to keep things separated, mostly
for future improvements on that infrastructure. I can get rid of that function
and put back the code in index_has_outadted_dependency() if that's overkill.

Yes, sorry. I meant index_has_outdated_collation() and
index_has_outdated_dependency().
--
Michael

#26

Mark Dilger

mark.dilger@enterprisedb.com

almost 5 years ago

In reply to: Julien Rouhaud (#22)

Re: REINDEX backend filtering

On Mar 14, 2021, at 12:10 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

v6 attached, rebase only due to conflict with recent commit.

Hi Julien,

I'm coming to this patch quite late, perhaps too late to change design decision in time for version 14.

+	if (outdated && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_log_error("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+					 "outdated", "14");
+		exit(1);
+	}

If detection of outdated indexes were performed entirely in the frontend (reindexdb) rather than in the backend (reindex command), would reindexdb be able to connect to older servers? Looking quickly that the catalogs, it appears pg_index, pg_depend, pg_collation and a call to the SQL function pg_collation_actual_version() compared against pg_depend.refobjversion would be enough to construct a list of indexes in need of reindexing. Am I missing something here?

I understand that wouldn't help somebody wanting to reindex from psql. Is that the whole reason you went a different direction with this feature?

+ printf(_(" --outdated only process indexes having outdated depencies\n"));

typo.

+ bool outdated; /* depends on at least on deprected collation? */

typo.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#27

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Michael Paquier (#25)

Re: REINDEX backend filtering

On Mon, Mar 15, 2021 at 08:56:00AM +0900, Michael Paquier wrote:

On Sun, Mar 14, 2021 at 10:57:37PM +0800, Julien Rouhaud wrote:

Is there really a use case for reindexing a specific index and at the same time
asking for possibly ignoring it? I think we should just forbid REINDEX
(OUTDATED) INDEX. What do you think?

I think that there would be cases to be able to handle that, say if a
user wants to works on a specific set of indexes one-by-one.

If a user want to work on a specific set of indexes one at a time, then the
list of indexes is probably already retrieved from some SQL query and there's
already all necessary infrastructure to properly filter the non oudated
indexes.

There is
also the argument of inconsistency with the other commands.

Yes, but the other commands dynamically construct a list of indexes.

The only use case I see would be to process a partitioned index if some of the
underlying indexes have already been processed. IMO this is better addressed
by REINDEX TABLE.

Anyway I'll make REINDEX (OUTDATED) INDEX to maybe reindex the explicitely
stated index name since you think it's a better behavior.

I was thinking that users would be more interested in the list of indexes being
processed rather than the full list of indexes and a mention of whether it was
processed or not. I can change that if you prefer.

How invasive do you think it would be to add a note in the verbose
output when indexes are skipped?

Probably not too invasive, but the verbose output is already inconsistent:

# reindex (verbose) table tt;
NOTICE: 00000: table "tt" has no indexes to reindex

But a REINDEX (VERBOSE) DATABASE won't emit such message. I'm assuming that
it's because it doesn't make sense to warn in that case as the user didn't
explicitly specified the table name. We have the same behavior for now when
using the OUTDATED option if no indexes are processed. Should that be changed
too?

Did you mean index_has_outdated_collation() and
index_has_outdated_dependency()? It's just to keep things separated, mostly
for future improvements on that infrastructure. I can get rid of that function
and put back the code in index_has_outadted_dependency() if that's overkill.

Yes, sorry. I meant index_has_outdated_collation() and
index_has_outdated_dependency().

And are you ok with this function?

#28

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Mark Dilger (#26)

Re: REINDEX backend filtering

Hi Mark,

On Sun, Mar 14, 2021 at 05:01:20PM -0700, Mark Dilger wrote:

On Mar 14, 2021, at 12:10 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

I'm coming to this patch quite late, perhaps too late to change design decision in time for version 14.

Thanks for lookint at it!

+	if (outdated && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_log_error("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+					 "outdated", "14");
+		exit(1);
+	}
If detection of outdated indexes were performed entirely in the frontend (reindexdb) rather than in the backend (reindex command), would reindexdb be able to connect to older servers? Looking quickly that the catalogs, it appears pg_index, pg_depend, pg_collation and a call to the SQL function pg_collation_actual_version() compared against pg_depend.refobjversion would be enough to construct a list of indexes in need of reindexing. Am I missing something here?

There won't be any need to connect on older servers if the patch is committed
in this commitfest as refobjversion was also added in pg14.

I understand that wouldn't help somebody wanting to reindex from psql. Is that the whole reason you went a different direction with this feature?

This was already discussed with Magnus and Michael. The main reason for that
are:

- no need for a whole new infrastructure to be able to process a list of
indexes in parallel which would be required if getting the list of indexes in
the client

- if done in the backend, then the ability is immediately available for all
user scripts, compared to the burden of writing the needed query (with the
usual caveats like quoting, qualifying all objects if the search_path isn't
safe and such) and looping though all the results.

+ printf(_(" --outdated only process indexes having outdated depencies\n"));

typo.

+ bool outdated; /* depends on at least on deprected collation? */

typo.

Thanks! I'll fix those.

#29

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Julien Rouhaud (#27)

2 attachment(s)

Re: REINDEX backend filtering

Please find attached v7, with the following changes:

- all typo reported by Michael and Mark are fixed
- REINDEX (OUTDATED) INDEX will now ignore the index if it doesn't have any
outdated dependency. Partitioned index are correctly handled.
- REINDEX (OUTDATED, VERBOSE) will now inform caller of ignored indexes, with
lines of the form:

NOTICE: index "index_name" has no outdated dependency

- updated regression tests to cover all those changes. I kept the current
approach of using simple SQL test listing the ignored indexes. I also added
some OUDATED option to collate.icu.utf8 tests so that we also check that both
REINDEX and REINDEX(OUTDATED) work as expected.
- move pg_index_has_outdated_dependency to 0002

I didn't remove index_has_outdated_collation() for now.

Attachments:

v7-0001-Add-a-new-OUTDATED-filtering-facility-for-REINDEX.patchtext/x-diff; charset=us-asciiDownload

From 91bcd6e4565164314eb6444635ca274695de3748 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 3 Dec 2020 15:54:42 +0800
Subject: [PATCH v7 1/2] Add a new OUTDATED filtering facility for REINDEX
 command.

OUTDATED is added a new unreserved keyword.

When used, REINDEX will only process indexes that have an outdated dependency.
For now, only dependency on collations are supported but we'll likely support
other kind of dependency in the future.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Michael Paquier, Mark Dilger
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 doc/src/sgml/ref/reindex.sgml                 | 12 +++
 src/backend/access/index/indexam.c            | 59 ++++++++++++++
 src/backend/catalog/index.c                   | 79 ++++++++++++++++++-
 src/backend/commands/indexcmds.c              | 37 ++++++++-
 src/backend/parser/gram.y                     |  4 +-
 src/backend/utils/cache/relcache.c            | 47 +++++++++++
 src/bin/psql/tab-complete.c                   |  2 +-
 src/include/access/genam.h                    |  1 +
 src/include/catalog/index.h                   |  3 +
 src/include/parser/kwlist.h                   |  1 +
 src/include/utils/relcache.h                  |  1 +
 .../regress/expected/collate.icu.utf8.out     | 12 ++-
 src/test/regress/expected/create_index.out    | 27 +++++++
 src/test/regress/sql/collate.icu.utf8.sql     | 12 ++-
 src/test/regress/sql/create_index.sql         | 18 +++++
 15 files changed, 301 insertions(+), 14 deletions(-)

diff --git a/doc/src/sgml/ref/reindex.sgml b/doc/src/sgml/ref/reindex.sgml
index ff4dba8c36..c4749c338b 100644
--- a/doc/src/sgml/ref/reindex.sgml
+++ b/doc/src/sgml/ref/reindex.sgml
@@ -26,6 +26,7 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
 <phrase>where <replaceable class="parameter">option</replaceable> can be one of:</phrase>
 
     CONCURRENTLY [ <replaceable class="parameter">boolean</replaceable> ]
+    OUTDATED [ <replaceable class="parameter">boolean</replaceable> ]
     TABLESPACE <replaceable class="parameter">new_tablespace</replaceable>
     VERBOSE [ <replaceable class="parameter">boolean</replaceable> ]
 </synopsis>
@@ -188,6 +189,17 @@ REINDEX [ ( <replaceable class="parameter">option</replaceable> [, ...] ) ] { IN
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>OUTDATED</literal></term>
+    <listitem>
+     <para>
+      This option can be used to filter the list of indexes to rebuild and only
+      process indexes that have outdated dependencies.  Fow now, the only
+      dependency handled currently is the collation provider version.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><literal>TABLESPACE</literal></term>
     <listitem>
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c
index 3d2dbed708..dc1c85cf0d 100644
--- a/src/backend/access/index/indexam.c
+++ b/src/backend/access/index/indexam.c
@@ -145,6 +145,65 @@ index_open(Oid relationId, LOCKMODE lockmode)
 	return r;
 }
 
+/* ----------------
+ *		try_index_open - open an index relation by relation OID
+ *
+ *		Same as index_open, except return NULL instead of failing
+ *		if the index does not exist.
+ * ----------------
+ */
+Relation
+try_index_open(Oid relationId, LOCKMODE lockmode)
+{
+	Relation	r;
+
+	Assert(lockmode >= NoLock && lockmode < MAX_LOCKMODES);
+
+	/* Get the lock first */
+	if (lockmode != NoLock)
+		LockRelationOid(relationId, lockmode);
+
+	/*
+	 * Now that we have the lock, probe to see if the relation really exists
+	 * or not.
+	 */
+	if (!SearchSysCacheExists1(RELOID, ObjectIdGetDatum(relationId)))
+	{
+		/* Release useless lock */
+		if (lockmode != NoLock)
+			UnlockRelationOid(relationId, lockmode);
+
+		return NULL;
+	}
+
+	/* Should be safe to do a relcache load */
+	r = RelationIdGetRelation(relationId);
+
+	if (!RelationIsValid(r))
+		elog(ERROR, "could not open relation with OID %u", relationId);
+
+	/* If we didn't get the lock ourselves, assert that caller holds one */
+	Assert(lockmode != NoLock ||
+		   CheckRelationLockedByMe(r, AccessShareLock, true));
+
+	if (r->rd_rel->relkind != RELKIND_INDEX &&
+		r->rd_rel->relkind != RELKIND_PARTITIONED_INDEX)
+	{
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+				 errmsg("\"%s\" is not an index",
+						RelationGetRelationName(r))));
+	}
+
+	/* Make note that we've accessed a temporary relation */
+	if (RelationUsesLocalBuffers(r))
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
+
+	pgstat_initstats(r);
+
+	return r;
+}
+
 /* ----------------
  *		index_close - close an index relation
  *
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 4ef61b5efd..47e6a54149 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -100,6 +100,12 @@ typedef struct
 	Oid			pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER];
 } SerializedReindexState;
 
+typedef struct
+{
+	Oid relid;	/* targetr index oid */
+	bool outdated;	/* depends on at least on deprecated collation? */
+} IndexHasOutdatedColl;
+
 /* non-export function prototypes */
 static bool relationHasPrimaryKey(Relation rel);
 static TupleDesc ConstructTupleDescriptor(Relation heapRelation,
@@ -1351,6 +1357,77 @@ index_check_collation_versions(Oid relid)
 	list_free(context.warned_colls);
 }
 
+/*
+ * Detect if an index depends on at least one outdated collation.
+ * This is a callback for visitDependenciesOf().
+ */
+static bool
+do_check_index_has_outdated_collation(const ObjectAddress *otherObject,
+										const char *version,
+										char **new_version,
+										void *data)
+{
+	IndexHasOutdatedColl *context = data;
+	char *current_version;
+
+	/* We only care about dependencies on collations. */
+	if (otherObject->classId != CollationRelationId)
+		return false;
+
+	/* Fast exit if we already found a outdated collation version. */
+	if (context->outdated)
+		return false;
+
+	/* Ask the provider for the current version.  Give up if unsupported. */
+	current_version = get_collation_version_for_oid(otherObject->objectId,
+													false);
+	if (!current_version)
+		return false;
+
+	if (!version || strcmp(version, current_version) != 0)
+		context->outdated = true;
+
+	return false;
+}
+
+/*
+ * Check whether the given index has a dependency with an outdated
+ * collation version.
+ * Caller must hold a suitable lock and make sure that the given Oid belongs to
+ * an index.
+ */
+bool
+index_has_outdated_collation(Oid indexOid)
+{
+	ObjectAddress object;
+	IndexHasOutdatedColl context;
+
+	object.classId = RelationRelationId;
+	object.objectId = indexOid;
+	object.objectSubId = 0;
+
+	context.relid = indexOid;
+	context.outdated = false;
+
+	visitDependenciesOf(&object, &do_check_index_has_outdated_collation,
+						&context);
+
+	return context.outdated;
+}
+
+/*
+ * Check whether the given index has a dependency with an outdated
+ * refobjversion.
+ * Caller must hold a suitable lock and make sure that the given Oid belongs to
+ * an index.
+ * For now, only dependency on collations are supported.
+ */
+bool
+index_has_outdated_dependency(Oid indexOid)
+{
+	return index_has_outdated_collation(indexOid);
+}
+
 /*
  * Update the version for collations.  A callback for visitDependenciesOf().
  */
@@ -3991,7 +4068,7 @@ reindex_relation(Oid relid, int flags, ReindexParams *params)
 	 * relcache to get this with a sequential scan if ignoring system
 	 * indexes.)
 	 */
-	indexIds = RelationGetIndexList(rel);
+	indexIds = RelationGetIndexListFiltered(rel, params->options);
 
 	if (flags & REINDEX_REL_SUPPRESS_INDEX_USE)
 	{
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8bc652ecd3..90ea95f40d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -2484,6 +2484,7 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 	bool		concurrently = false;
 	bool		verbose = false;
 	char	   *tablespacename = NULL;
+	bool		outdated_filter = false;
 
 	/* Parse option list */
 	foreach(lc, stmt->params)
@@ -2496,6 +2497,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 			concurrently = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "tablespace") == 0)
 			tablespacename = defGetString(opt);
+		else if (strcmp(opt->defname, "outdated") == 0)
+			outdated_filter = defGetBoolean(opt);
 		else
 			ereport(ERROR,
 					(errcode(ERRCODE_SYNTAX_ERROR),
@@ -2510,7 +2513,8 @@ ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel)
 
 	params.options =
 		(verbose ? REINDEXOPT_VERBOSE : 0) |
-		(concurrently ? REINDEXOPT_CONCURRENTLY : 0);
+		(concurrently ? REINDEXOPT_CONCURRENTLY : 0) |
+		(outdated_filter ? REINDEXOPT_OUTDATED : 0);
 
 	/*
 	 * Assign the tablespace OID to move indexes to, with InvalidOid to do
@@ -2605,6 +2609,20 @@ ReindexIndex(RangeVar *indexRelation, ReindexParams *params, bool isTopLevel)
 	persistence = get_rel_persistence(indOid);
 	relkind = get_rel_relkind(indOid);
 
+	/*
+	 * If the index isn't partitioned, we can detect here if it has any oudated
+	 * dependency.
+	 */
+	if (relkind == RELKIND_INDEX &&
+		(params->options & REINDEXOPT_OUTDATED) != 0 &&
+		!index_has_outdated_dependency(indOid))
+	{
+		ereport(NOTICE,
+				(errmsg("index \"%s\" has no outdated dependency",
+						get_rel_name(indOid))));
+		return;
+	}
+
 	if (relkind == RELKIND_PARTITIONED_INDEX)
 		ReindexPartitions(indOid, params, isTopLevel);
 	else if ((params->options & REINDEXOPT_CONCURRENTLY) != 0 &&
@@ -3049,6 +3067,17 @@ ReindexPartitions(Oid relid, ReindexParams *params, bool isTopLevel)
 		Assert(partkind == RELKIND_INDEX ||
 			   partkind == RELKIND_RELATION);
 
+		/* Ignore indexes that don't depend on outdated dependency if needed */
+		if (partkind == RELKIND_INDEX &&
+			(params->options & REINDEXOPT_OUTDATED) != 0 &&
+			!index_has_outdated_dependency(partoid))
+		{
+			ereport(NOTICE,
+					(errmsg("index \"%s\" has no outdated dependency",
+							get_rel_name(partoid))));
+			continue;
+		}
+
 		/* Save partition OID */
 		old_context = MemoryContextSwitchTo(reindex_context);
 		partitions = lappend_oid(partitions, partoid);
@@ -3307,7 +3336,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 									RelationGetRelationName(heapRelation))));
 
 				/* Add all the valid indexes of relation to list */
-				foreach(lc, RelationGetIndexList(heapRelation))
+				foreach(lc, RelationGetIndexListFiltered(heapRelation,
+														 params->options))
 				{
 					Oid			cellOid = lfirst_oid(lc);
 					Relation	indexRelation = index_open(cellOid,
@@ -3359,7 +3389,8 @@ ReindexRelationConcurrently(Oid relationOid, ReindexParams *params)
 
 					MemoryContextSwitchTo(oldcontext);
 
-					foreach(lc2, RelationGetIndexList(toastRelation))
+					foreach(lc2, RelationGetIndexListFiltered(toastRelation,
+															  params->options))
 					{
 						Oid			cellOid = lfirst_oid(lc2);
 						Relation	indexRelation = index_open(cellOid,
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 652be0b96d..a67cfa867c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -674,7 +674,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	NULLS_P NUMERIC
 
 	OBJECT_P OF OFF OFFSET OIDS OLD ON ONLY OPERATOR OPTION OPTIONS OR
-	ORDER ORDINALITY OTHERS OUT_P OUTER_P
+	ORDER ORDINALITY OTHERS OUT_P OUTDATED OUTER_P
 	OVER OVERLAPS OVERLAY OVERRIDING OWNED OWNER
 
 	PARALLEL PARSER PARTIAL PARTITION PASSING PASSWORD PLACING PLANS POLICY
@@ -15430,6 +15430,7 @@ unreserved_keyword:
 			| OPTIONS
 			| ORDINALITY
 			| OTHERS
+			| OUTDATED
 			| OVER
 			| OVERRIDING
 			| OWNED
@@ -16002,6 +16003,7 @@ bare_label_keyword:
 			| ORDINALITY
 			| OTHERS
 			| OUT_P
+			| OUTDATED
 			| OUTER_P
 			| OVERLAY
 			| OVERRIDING
diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 7ef510cd01..94fc7b1c9e 100644
--- a/src/backend/utils/cache/relcache.c
+++ b/src/backend/utils/cache/relcache.c
@@ -4620,6 +4620,53 @@ RelationGetIndexList(Relation relation)
 	return result;
 }
 
+/*
+ * RelationGetIndexListFiltered -- get a filtered list of indexes on this
+ * relation.
+ *
+ * Calls RelationGetIndexList and only keep indexes that have an outdated
+ * dependency.  For now, only collation version dependency is supported.
+ */
+List *
+RelationGetIndexListFiltered(Relation relation, bits32 options)
+{
+	List	   *result,
+			   *full_list;
+	ListCell   *lc;
+
+	full_list = RelationGetIndexList(relation);
+
+	/* Fast exit if no filtering was asked, or if the list if empty. */
+	if (((options & REINDEXOPT_OUTDATED) == 0) || full_list == NIL)
+		return full_list;
+
+	result = NIL;
+	foreach(lc, full_list)
+	{
+		Oid		indexOid = lfirst_oid(lc);
+
+		Assert(get_rel_relkind(indexOid) == RELKIND_INDEX ||
+			   get_rel_relkind(indexOid) == RELKIND_PARTITIONED_INDEX);
+
+		/*
+		 * Check for any oudated dependency.
+		 */
+		if (index_has_outdated_dependency(indexOid))
+		{
+			result = lappend_oid(result, indexOid);
+			continue;
+		}
+
+		/* Didn't find any outdated dependency, index will be ignored. */
+		if (((options & REINDEXOPT_OUTDATED) != 0))
+			ereport(NOTICE,
+					(errmsg("index \"%s\" has no outdated dependency",
+							get_rel_name(indexOid))));
+	}
+
+	return result;
+}
+
 /*
  * RelationGetStatExtList
  *		get a list of OIDs of statistics objects on this relation
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index ecdb8d752b..0843f36580 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3674,7 +3674,7 @@ psql_completion(const char *text, int start, int end)
 		 * one word, so the above test is correct.
 		 */
 		if (ends_with(prev_wd, '(') || ends_with(prev_wd, ','))
-			COMPLETE_WITH("CONCURRENTLY", "TABLESPACE", "VERBOSE");
+			COMPLETE_WITH("CONCURRENTLY", "OUTDATED'", "TABLESPACE", "VERBOSE");
 		else if (TailMatches("TABLESPACE"))
 			COMPLETE_WITH_QUERY(Query_for_list_of_tablespaces);
 	}
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index 4515401869..fab5dbc101 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -138,6 +138,7 @@ typedef struct IndexOrderByDistance
 #define IndexScanIsValid(scan) PointerIsValid(scan)
 
 extern Relation index_open(Oid relationId, LOCKMODE lockmode);
+extern Relation try_index_open(Oid relationId, LOCKMODE lockmode);
 extern void index_close(Relation relation, LOCKMODE lockmode);
 
 extern bool index_insert(Relation indexRelation,
diff --git a/src/include/catalog/index.h b/src/include/catalog/index.h
index e22d506436..298c0c633c 100644
--- a/src/include/catalog/index.h
+++ b/src/include/catalog/index.h
@@ -42,6 +42,7 @@ typedef struct ReindexParams
 #define REINDEXOPT_REPORT_PROGRESS 0x02 /* report pgstat progress */
 #define REINDEXOPT_MISSING_OK 	0x04	/* skip missing relations */
 #define REINDEXOPT_CONCURRENTLY	0x08	/* concurrent mode */
+#define REINDEXOPT_OUTDATED		0x10/* outdated collation only */
 
 /* state info for validate_index bulkdelete callback */
 typedef struct ValidateIndexState
@@ -137,6 +138,8 @@ extern void FormIndexDatum(IndexInfo *indexInfo,
 						   bool *isnull);
 
 extern void index_check_collation_versions(Oid relid);
+extern bool index_has_outdated_collation(Oid indexOid);
+extern bool index_has_outdated_dependency(Oid indexOid);
 extern void index_update_collation_versions(Oid relid, Oid coll);
 
 extern void index_build(Relation heapRelation,
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 28083aaac9..e6c725d9a6 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -295,6 +295,7 @@ PG_KEYWORD("order", ORDER, RESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("ordinality", ORDINALITY, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("others", OTHERS, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("out", OUT_P, COL_NAME_KEYWORD, BARE_LABEL)
+PG_KEYWORD("outdated", OUTDATED, UNRESERVED_KEYWORD, BARE_LABEL)
 PG_KEYWORD("outer", OUTER_P, TYPE_FUNC_NAME_KEYWORD, BARE_LABEL)
 PG_KEYWORD("over", OVER, UNRESERVED_KEYWORD, AS_LABEL)
 PG_KEYWORD("overlaps", OVERLAPS, TYPE_FUNC_NAME_KEYWORD, AS_LABEL)
diff --git a/src/include/utils/relcache.h b/src/include/utils/relcache.h
index 2fcdf79323..a7a2272abd 100644
--- a/src/include/utils/relcache.h
+++ b/src/include/utils/relcache.h
@@ -45,6 +45,7 @@ extern void RelationClose(Relation relation);
  */
 extern List *RelationGetFKeyList(Relation relation);
 extern List *RelationGetIndexList(Relation relation);
+extern List *RelationGetIndexListFiltered(Relation relation, bits32 options);
 extern List *RelationGetStatExtList(Relation relation);
 extern Oid	RelationGetPrimaryKeyIndex(Relation relation);
 extern Oid	RelationGetReplicaIndex(Relation relation);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index de70cb1212..001f091fb3 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -2094,9 +2094,11 @@ UPDATE pg_depend SET refobjversion = 'not a version'
 WHERE refclassid = 'pg_collation'::regclass
 AND objid::regclass::text LIKE 'icuidx%'
 AND refobjversion IS NOT NULL;
-REINDEX TABLE collate_test;
-REINDEX TABLE collate_part_0;
+SET client_min_messages to WARNING; -- pg_toast index names aren't stable
+REINDEX (OUTDATED) TABLE collate_test;
+REINDEX (OUTDATED) TABLE collate_part_0;
 REINDEX TABLE collate_part_1;
+RESET client_min_messages;
 SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
  objid 
 -------
@@ -2107,9 +2109,11 @@ UPDATE pg_depend SET refobjversion = 'not a version'
 WHERE refclassid = 'pg_collation'::regclass
 AND objid::regclass::text LIKE 'icuidx%'
 AND refobjversion IS NOT NULL;
-REINDEX TABLE CONCURRENTLY collate_test;
+SET client_min_messages to WARNING; -- pg_toast index names aren't stable
+REINDEX (OUTDATED) TABLE CONCURRENTLY collate_test;
 REINDEX TABLE CONCURRENTLY collate_part_0;
-REINDEX INDEX CONCURRENTLY icuidx17_part;
+REINDEX (OUTDATED) INDEX CONCURRENTLY icuidx17_part;
+RESET client_min_messages;
 SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
  objid 
 -------
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 830fdddf24..959fad6b71 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2018,6 +2018,33 @@ INFO:  index "reindex_verbose_pkey" was reindexed
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 --
+-- REINDEX (OUTDATED)
+--
+CREATE TABLE reindex_coll(id integer primary key) PARTITION BY LIST (id);
+CREATE TABLE reindex_coll_1 PARTITION OF reindex_coll FOR VALUES IN (1);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (VERBOSE, OUTDATED) TABLE reindex_coll;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+REINDEX (VERBOSE, OUTDATED) INDEX reindex_coll_pkey;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+REINDEX (VERBOSE, OUTDATED) TABLE reindex_coll_1;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+NOTICE:  table "reindex_coll_1" has no indexes to reindex
+REINDEX (VERBOSE, OUTDATED) INDEX reindex_coll_1_pkey;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) TABLE reindex_coll;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) INDEX reindex_coll_pkey;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) TABLE reindex_coll_1;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+NOTICE:  table "reindex_coll_1" has no indexes that can be reindexed concurrently
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) INDEX reindex_coll_1_pkey;
+NOTICE:  index "reindex_coll_1_pkey" has no outdated dependency
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+--
 -- REINDEX CONCURRENTLY
 --
 CREATE TABLE concur_reindex_tab (c1 int);
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index dd5d208854..a6504512f7 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -836,9 +836,11 @@ WHERE refclassid = 'pg_collation'::regclass
 AND objid::regclass::text LIKE 'icuidx%'
 AND refobjversion IS NOT NULL;
 
-REINDEX TABLE collate_test;
-REINDEX TABLE collate_part_0;
+SET client_min_messages to WARNING; -- pg_toast index names aren't stable
+REINDEX (OUTDATED) TABLE collate_test;
+REINDEX (OUTDATED) TABLE collate_part_0;
 REINDEX TABLE collate_part_1;
+RESET client_min_messages;
 
 SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
 
@@ -847,9 +849,11 @@ UPDATE pg_depend SET refobjversion = 'not a version'
 WHERE refclassid = 'pg_collation'::regclass
 AND objid::regclass::text LIKE 'icuidx%'
 AND refobjversion IS NOT NULL;
-REINDEX TABLE CONCURRENTLY collate_test;
+SET client_min_messages to WARNING; -- pg_toast index names aren't stable
+REINDEX (OUTDATED) TABLE CONCURRENTLY collate_test;
 REINDEX TABLE CONCURRENTLY collate_part_0;
-REINDEX INDEX CONCURRENTLY icuidx17_part;
+REINDEX (OUTDATED) INDEX CONCURRENTLY icuidx17_part;
+RESET client_min_messages;
 
 SELECT objid::regclass FROM pg_depend WHERE refobjversion = 'not a version';
 
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 8bc76f7c6f..4da6e2b5e8 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -790,6 +790,24 @@ REINDEX (VERBOSE) TABLE reindex_verbose;
 \set VERBOSITY default
 DROP TABLE reindex_verbose;
 
+--
+-- REINDEX (OUTDATED)
+--
+CREATE TABLE reindex_coll(id integer primary key) PARTITION BY LIST (id);
+CREATE TABLE reindex_coll_1 PARTITION OF reindex_coll FOR VALUES IN (1);
+\set VERBOSITY terse \\ -- suppress machine-dependent details
+-- no suitable index should be found
+REINDEX (VERBOSE, OUTDATED) TABLE reindex_coll;
+REINDEX (VERBOSE, OUTDATED) INDEX reindex_coll_pkey;
+REINDEX (VERBOSE, OUTDATED) TABLE reindex_coll_1;
+REINDEX (VERBOSE, OUTDATED) INDEX reindex_coll_1_pkey;
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) TABLE reindex_coll;
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) INDEX reindex_coll_pkey;
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) TABLE reindex_coll_1;
+REINDEX (VERBOSE, OUTDATED, CONCURRENTLY) INDEX reindex_coll_1_pkey;
+\set VERBOSITY default
+DROP TABLE reindex_coll;
+
 --
 -- REINDEX CONCURRENTLY
 --
-- 
2.30.1

v7-0002-Add-a-outdated-option-to-reindexdb.patchtext/x-diff; charset=us-asciiDownload

From 28c12ae6f353abec9e4bcb141d287ab981298353 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud <julien.rouhaud@free.fr>
Date: Thu, 25 Feb 2021 01:33:58 +0800
Subject: [PATCH v7 2/2] Add a --outdated option to reindexdb

This uses the new OUTDATED option for REINDEX.  If user asks for multiple job,
the list of tables to process will be sorted by the total size of underlying
indexes that have outdated dependency using a new
pg_index_has_outdated_dependency(regclass) SQL function.

Catversion (should be) bumped.

Author: Julien Rouhaud <rjuju123@gmail.com>
Reviewed-by: Michael Paquier, Mark Dilger
Discussion: https://postgr.es/m/20201203093143.GA64934%40nol
---
 doc/src/sgml/func.sgml             |  27 ++++--
 src/backend/catalog/index.c        |  22 +++++
 src/bin/scripts/reindexdb.c        | 143 ++++++++++++++++++++++++-----
 src/bin/scripts/t/090_reindexdb.pl |  34 ++++++-
 src/include/catalog/pg_proc.dat    |   4 +
 5 files changed, 198 insertions(+), 32 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 9492a3c6b9..0eda6678ac 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -26448,12 +26448,13 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
 
    <para>
     <xref linkend="functions-admin-index-table"/> shows the functions
-    available for index maintenance tasks.  (Note that these maintenance
-    tasks are normally done automatically by autovacuum; use of these
-    functions is only required in special cases.)
-    These functions cannot be executed during recovery.
-    Use of these functions is restricted to superusers and the owner
-    of the given index.
+    available for index maintenance tasks.  (Note that the maintenance
+    tasks performing actions on indexes are normally done automatically by
+    autovacuum; use of these functions is only required in special cases.)
+    The functions performing actions on indexes cannot be executed during
+    recovery.
+    Use of the functions performing actions on indexes is restricted to
+    superusers and the owner of the given index.
    </para>
 
    <table id="functions-admin-index-table">
@@ -26538,6 +26539,20 @@ SELECT pg_size_pretty(sum(pg_relation_size(relid))) AS total_size
         option.
        </para></entry>
       </row>
+
+      <row>
+       <entry role="func_table_entry"><para role="func_signature">
+        <indexterm>
+         <primary>pg_index_has_outdated_dependency</primary>
+        </indexterm>
+        <function>pg_index_has_outdated_dependency</function> ( <parameter>index</parameter> <type>regclass</type> )
+        <returnvalue>boolean</returnvalue>
+       </para>
+       <para>
+        Check if the specified index has any outdated dependency.  For now only
+        dependency on collations are supported.
+       </para></entry>
+      </row>
      </tbody>
     </tgroup>
    </table>
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 47e6a54149..6848e61df3 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1390,6 +1390,28 @@ do_check_index_has_outdated_collation(const ObjectAddress *otherObject,
 	return false;
 }
 
+/*
+ * SQL wrapper around index_has_outdated_dependency.
+ */
+Datum
+pg_index_has_outdated_dependency(PG_FUNCTION_ARGS)
+{
+	Oid			indexOid = PG_GETARG_OID(0);
+	Relation	rel;
+	bool		res;
+
+	rel = try_index_open(indexOid, AccessShareLock);
+
+	if (rel == NULL)
+		PG_RETURN_NULL();
+
+	res = index_has_outdated_dependency(indexOid);
+
+	relation_close(rel, AccessShareLock);
+
+	PG_RETURN_BOOL(res);
+}
+
 /*
  * Check whether the given index has a dependency with an outdated
  * collation version.
diff --git a/src/bin/scripts/reindexdb.c b/src/bin/scripts/reindexdb.c
index fc0681538a..d252b1454f 100644
--- a/src/bin/scripts/reindexdb.c
+++ b/src/bin/scripts/reindexdb.c
@@ -35,20 +35,23 @@ typedef enum ReindexType
 static SimpleStringList *get_parallel_object_list(PGconn *conn,
 												  ReindexType type,
 												  SimpleStringList *user_list,
+												  bool outdated,
 												  bool echo);
 static void reindex_one_database(ConnParams *cparams, ReindexType type,
 								 SimpleStringList *user_list,
 								 const char *progname,
 								 bool echo, bool verbose, bool concurrently,
-								 int concurrentCons, const char *tablespace);
+								 int concurrentCons, const char *tablespace,
+								 bool outdated);
 static void reindex_all_databases(ConnParams *cparams,
 								  const char *progname, bool echo,
 								  bool quiet, bool verbose, bool concurrently,
-								  int concurrentCons, const char *tablespace);
+								  int concurrentCons, const char *tablespace,
+								  bool outdated);
 static void run_reindex_command(PGconn *conn, ReindexType type,
 								const char *name, bool echo, bool verbose,
 								bool concurrently, bool async,
-								const char *tablespace);
+								const char *tablespace, bool outdated);
 
 static void help(const char *progname);
 
@@ -74,6 +77,7 @@ main(int argc, char *argv[])
 		{"concurrently", no_argument, NULL, 1},
 		{"maintenance-db", required_argument, NULL, 2},
 		{"tablespace", required_argument, NULL, 3},
+		{"outdated", no_argument, NULL, 4},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -95,6 +99,7 @@ main(int argc, char *argv[])
 	bool		quiet = false;
 	bool		verbose = false;
 	bool		concurrently = false;
+	bool		outdated = false;
 	SimpleStringList indexes = {NULL, NULL};
 	SimpleStringList tables = {NULL, NULL};
 	SimpleStringList schemas = {NULL, NULL};
@@ -170,6 +175,9 @@ main(int argc, char *argv[])
 			case 3:
 				tablespace = pg_strdup(optarg);
 				break;
+			case 4:
+				outdated = true;
+				break;
 			default:
 				fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
 				exit(1);
@@ -234,7 +242,8 @@ main(int argc, char *argv[])
 		cparams.dbname = maintenance_db;
 
 		reindex_all_databases(&cparams, progname, echo, quiet, verbose,
-							  concurrently, concurrentCons, tablespace);
+							  concurrently, concurrentCons, tablespace,
+							  outdated);
 	}
 	else if (syscatalog)
 	{
@@ -253,12 +262,17 @@ main(int argc, char *argv[])
 			pg_log_error("cannot reindex specific index(es) and system catalogs at the same time");
 			exit(1);
 		}
-
 		if (concurrentCons > 1)
 		{
 			pg_log_error("cannot use multiple jobs to reindex system catalogs");
 			exit(1);
 		}
+		if (outdated)
+		{
+			pg_log_error("cannot filter indexes having outdated dependencies "
+						 "and reindex system catalogs at the same time");
+			exit(1);
+		}
 
 		if (dbname == NULL)
 		{
@@ -274,7 +288,7 @@ main(int argc, char *argv[])
 
 		reindex_one_database(&cparams, REINDEX_SYSTEM, NULL,
 							 progname, echo, verbose,
-							 concurrently, 1, tablespace);
+							 concurrently, 1, tablespace, outdated);
 	}
 	else
 	{
@@ -304,17 +318,20 @@ main(int argc, char *argv[])
 		if (schemas.head != NULL)
 			reindex_one_database(&cparams, REINDEX_SCHEMA, &schemas,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 
 		if (indexes.head != NULL)
 			reindex_one_database(&cparams, REINDEX_INDEX, &indexes,
 								 progname, echo, verbose,
-								 concurrently, 1, tablespace);
+								 concurrently, 1, tablespace,
+								 outdated);
 
 		if (tables.head != NULL)
 			reindex_one_database(&cparams, REINDEX_TABLE, &tables,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 
 		/*
 		 * reindex database only if neither index nor table nor schema is
@@ -323,7 +340,8 @@ main(int argc, char *argv[])
 		if (indexes.head == NULL && tables.head == NULL && schemas.head == NULL)
 			reindex_one_database(&cparams, REINDEX_DATABASE, NULL,
 								 progname, echo, verbose,
-								 concurrently, concurrentCons, tablespace);
+								 concurrently, concurrentCons, tablespace,
+								 outdated);
 	}
 
 	exit(0);
@@ -334,7 +352,7 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 					 SimpleStringList *user_list,
 					 const char *progname, bool echo,
 					 bool verbose, bool concurrently, int concurrentCons,
-					 const char *tablespace)
+					 const char *tablespace, bool outdated)
 {
 	PGconn	   *conn;
 	SimpleStringListCell *cell;
@@ -363,6 +381,14 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 		exit(1);
 	}
 
+	if (outdated && PQserverVersion(conn) < 140000)
+	{
+		PQfinish(conn);
+		pg_log_error("cannot use the \"%s\" option on server versions older than PostgreSQL %s",
+					 "outdated", "14");
+		exit(1);
+	}
+
 	if (!parallel)
 	{
 		switch (process_type)
@@ -399,14 +425,24 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 				 */
 				if (concurrently)
 					pg_log_warning("cannot reindex system catalogs concurrently, skipping all");
+				else if (outdated)
+				{
+					/*
+					 * The only supported kind of object that can be outdated
+					 * is collation.  No system catalog has any index that can
+					 * depend on an outdated collation, so skip system
+					 * catalogs.
+					 */
+				}
 				else
 					run_reindex_command(conn, REINDEX_SYSTEM, PQdb(conn), echo,
 										verbose, concurrently, false,
-										tablespace);
+										tablespace, outdated);
 
 				/* Build a list of relations from the database */
 				process_list = get_parallel_object_list(conn, process_type,
-														user_list, echo);
+														user_list, outdated,
+														echo);
 				process_type = REINDEX_TABLE;
 
 				/* Bail out if nothing to process */
@@ -419,7 +455,8 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 
 				/* Build a list of relations from all the schemas */
 				process_list = get_parallel_object_list(conn, process_type,
-														user_list, echo);
+														user_list, outdated,
+														echo);
 				process_type = REINDEX_TABLE;
 
 				/* Bail out if nothing to process */
@@ -485,7 +522,8 @@ reindex_one_database(ConnParams *cparams, ReindexType type,
 
 		ParallelSlotSetHandler(free_slot, TableCommandResultHandler, NULL);
 		run_reindex_command(free_slot->connection, process_type, objname,
-							echo, verbose, concurrently, true, tablespace);
+							echo, verbose, concurrently, true, tablespace,
+							outdated);
 
 		cell = cell->next;
 	} while (cell != NULL);
@@ -510,7 +548,7 @@ finish:
 static void
 run_reindex_command(PGconn *conn, ReindexType type, const char *name,
 					bool echo, bool verbose, bool concurrently, bool async,
-					const char *tablespace)
+					const char *tablespace, bool outdated)
 {
 	const char *paren = "(";
 	const char *comma = ", ";
@@ -537,6 +575,12 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
 		sep = comma;
 	}
 
+	if (outdated)
+	{
+		appendPQExpBuffer(&sql, "%sOUTDATED", sep);
+		sep = comma;
+	}
+
 	if (sep != paren)
 		appendPQExpBufferStr(&sql, ") ");
 
@@ -642,7 +686,7 @@ run_reindex_command(PGconn *conn, ReindexType type, const char *name,
  */
 static SimpleStringList *
 get_parallel_object_list(PGconn *conn, ReindexType type,
-						 SimpleStringList *user_list, bool echo)
+						 SimpleStringList *user_list, bool outdated, bool echo)
 {
 	PQExpBufferData catalog_query;
 	PQExpBufferData buf;
@@ -661,16 +705,41 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 	{
 		case REINDEX_DATABASE:
 			Assert(user_list == NULL);
+
 			appendPQExpBufferStr(&catalog_query,
 								 "SELECT c.relname, ns.nspname\n"
 								 " FROM pg_catalog.pg_class c\n"
 								 " JOIN pg_catalog.pg_namespace ns"
-								 " ON c.relnamespace = ns.oid\n"
+								 " ON c.relnamespace = ns.oid\n");
+
+			if (outdated)
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " JOIN pg_catalog.pg_index i"
+									 " ON c.oid = i.indrelid\n"
+									 " JOIN pg_catalog.pg_class ci"
+									 " ON i.indexrelid = ci.oid\n");
+			}
+
+			appendPQExpBufferStr(&catalog_query,
 								 " WHERE ns.nspname != 'pg_catalog'\n"
 								 "   AND c.relkind IN ("
 								 CppAsString2(RELKIND_RELATION) ", "
-								 CppAsString2(RELKIND_MATVIEW) ")\n"
-								 " ORDER BY c.relpages DESC;");
+								 CppAsString2(RELKIND_MATVIEW) ")\n");
+
+			if (outdated)
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " GROUP BY c.relname, ns.nspname\n"
+									 " ORDER BY pg_catalog.sum(ci.relpages)"
+									 " FILTER (WHERE pg_catalog.pg_index_has_outdated_dependency(ci.oid)) DESC;");
+			}
+			else
+			{
+				appendPQExpBufferStr(&catalog_query,
+									 " ORDER BY c.relpages DESC;");
+			}
+
 			break;
 
 		case REINDEX_SCHEMA:
@@ -688,7 +757,18 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 									 "SELECT c.relname, ns.nspname\n"
 									 " FROM pg_catalog.pg_class c\n"
 									 " JOIN pg_catalog.pg_namespace ns"
-									 " ON c.relnamespace = ns.oid\n"
+									 " ON c.relnamespace = ns.oid\n");
+
+				if (outdated)
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " JOIN pg_catalog.pg_index i"
+										 " ON c.oid = i.indrelid\n"
+										 " JOIN pg_catalog.pg_class ci"
+										 " ON i.indexrelid = ci.oid\n");
+				}
+
+				appendPQExpBufferStr(&catalog_query,
 									 " WHERE c.relkind IN ("
 									 CppAsString2(RELKIND_RELATION) ", "
 									 CppAsString2(RELKIND_MATVIEW) ")\n"
@@ -706,8 +786,20 @@ get_parallel_object_list(PGconn *conn, ReindexType type,
 					appendStringLiteralConn(&catalog_query, nspname, conn);
 				}
 
-				appendPQExpBufferStr(&catalog_query, ")\n"
-									 " ORDER BY c.relpages DESC;");
+				appendPQExpBufferStr(&catalog_query, ")\n");
+
+				if (outdated)
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " GROUP BY c.relname, ns.nspname\n"
+										 " ORDER BY pg_catalog.sum(ci.relpages)"
+										 " FILTER (WHERE pg_catalog.pg_index_has_outdated_dependency(ci.oid)) DESC;");
+				}
+				else
+				{
+					appendPQExpBufferStr(&catalog_query,
+										 " ORDER BY c.relpages DESC;");
+				}
 			}
 			break;
 
@@ -755,7 +847,7 @@ static void
 reindex_all_databases(ConnParams *cparams,
 					  const char *progname, bool echo, bool quiet, bool verbose,
 					  bool concurrently, int concurrentCons,
-					  const char *tablespace)
+					  const char *tablespace, bool outdated)
 {
 	PGconn	   *conn;
 	PGresult   *result;
@@ -779,7 +871,7 @@ reindex_all_databases(ConnParams *cparams,
 
 		reindex_one_database(cparams, REINDEX_DATABASE, NULL,
 							 progname, echo, verbose, concurrently,
-							 concurrentCons, tablespace);
+							 concurrentCons, tablespace, outdated);
 	}
 
 	PQclear(result);
@@ -798,6 +890,7 @@ help(const char *progname)
 	printf(_("  -e, --echo                   show the commands being sent to the server\n"));
 	printf(_("  -i, --index=INDEX            recreate specific index(es) only\n"));
 	printf(_("  -j, --jobs=NUM               use this many concurrent connections to reindex\n"));
+	printf(_("      --outdated               only process indexes having outdated dependencies\n"));
 	printf(_("  -q, --quiet                  don't write any messages\n"));
 	printf(_("  -s, --system                 reindex system catalogs\n"));
 	printf(_("  -S, --schema=SCHEMA          reindex specific schema(s) only\n"));
diff --git a/src/bin/scripts/t/090_reindexdb.pl b/src/bin/scripts/t/090_reindexdb.pl
index 159b637230..91fd602041 100644
--- a/src/bin/scripts/t/090_reindexdb.pl
+++ b/src/bin/scripts/t/090_reindexdb.pl
@@ -3,7 +3,7 @@ use warnings;
 
 use PostgresNode;
 use TestLib;
-use Test::More tests => 58;
+use Test::More tests => 70;
 
 program_help_ok('reindexdb');
 program_version_ok('reindexdb');
@@ -174,6 +174,9 @@ $node->command_fails(
 $node->command_fails(
 	[ 'reindexdb', '-j', '2', '-i', 'i1', 'postgres' ],
 	'parallel reindexdb cannot process indexes');
+$node->command_fails(
+	[ 'reindexdb', '-s', '--outdated' ],
+	'cannot reindex system catalog and filter indexes having outdated dependencies');
 $node->issues_sql_like(
 	[ 'reindexdb', '-j', '2', 'postgres' ],
 	qr/statement:\ REINDEX SYSTEM postgres;
@@ -196,3 +199,32 @@ $node->command_checks_all(
 		qr/^reindexdb: warning: cannot reindex system catalogs concurrently, skipping all/s
 	],
 	'parallel reindexdb for system with --concurrently skips catalogs');
+
+# Temporarily downgrade client-min-message to get the no-op report
+$ENV{PGOPTIONS} = '--client-min-messages=NOTICE';
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-t', 's1.t1', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/table "t1" has no indexes to reindex/],
+	'verbose reindexdb for outdated dependencies on a specific table reports no-op tables');
+
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-d', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/index "t2_id_idx" has no outdated dependency/],
+	'verbose reindexdb for outdated dependencies database wide reports all ignored indexes');
+$node->command_checks_all(
+	[ 'reindexdb',  '--outdated', '-v', '-j', '2', '-d', 'postgres' ],
+	0,
+	[qr/^$/],
+	[qr/table "t1" has no indexes to reindex/],
+	'parallel verbose reindexdb for outdated dependencies database wide reports no-op tables');
+
+# Switch back to WARNING client-min-message
+$ENV{PGOPTIONS} = '--client-min-messages=WARNING';
+$node->issues_sql_like(
+	[ 'reindexdb', '--outdated', '-t', 's1.t1', 'postgres' ],
+	qr/.*statement: REINDEX \(OUTDATED\) TABLE s1\.t1;/,
+	'reindexdb for outdated dependencies specify the OUTDATED keyword');
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 93393fcfd4..911c12ee2c 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -949,6 +949,10 @@
   proname => 'pg_indexam_has_property', provolatile => 's',
   prorettype => 'bool', proargtypes => 'oid text',
   prosrc => 'pg_indexam_has_property' },
+{ oid => '8102', descr => 'test property of an index',
+  proname => 'pg_index_has_outdated_dependency', provolatile => 's',
+  prorettype => 'bool', proargtypes => 'regclass',
+  prosrc => 'pg_index_has_outdated_dependency' },
 { oid => '637', descr => 'test property of an index',
   proname => 'pg_index_has_property', provolatile => 's', prorettype => 'bool',
   proargtypes => 'regclass text', prosrc => 'pg_index_has_property' },
-- 
2.30.1

#30

Mark Dilger

mark.dilger@enterprisedb.com

almost 5 years ago

In reply to: Julien Rouhaud (#29)

Re: REINDEX backend filtering

On Mar 14, 2021, at 8:33 PM, Julien Rouhaud <rjuju123@gmail.com> wrote:

<v7-0001-Add-a-new-OUTDATED-filtering-facility-for-REINDEX.patch><v7-0002-Add-a-outdated-option-to-reindexdb.patch>

In the docs, 0001, "Fow now, the only dependency handled currently",

"Fow now" is misspelled, and "For now" seems redundant when used with "currently".

In the docs, 0002, "For now only dependency on collations are supported."

"dependency" is singular, "are" is conjugated for plural.

In the docs, 0002, you forgot to update doc/src/sgml/ref/reindexdb.sgml with the documentation for the --outdated switch.

In the tests, you check that REINDEX (OUTDATED) doesn't do anything crazy, but you are not really testing the functionality so far as I can see, as you don't have any tests which cause the collation to be outdated. Am I right about that? I wonder if you could modify DefineCollation. In addition to the providers "icu" and "libc" that it currently accepts, I wonder if it might accept "test" or similar, and then you could create a test in src/test/modules that compiles a "test" provider, creates a database with indexes dependent on something from that provider, stops the database, updates the test collation, ...?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#31

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Mark Dilger (#30)

Re: REINDEX backend filtering

On Mon, Mar 15, 2021 at 09:30:43AM -0700, Mark Dilger wrote:

In the docs, 0001, "Fow now, the only dependency handled currently",

"Fow now" is misspelled, and "For now" seems redundant when used with "currently".

In the docs, 0002, "For now only dependency on collations are supported."

"dependency" is singular, "are" is conjugated for plural.

In the docs, 0002, you forgot to update doc/src/sgml/ref/reindexdb.sgml with the documentation for the --outdated switch.

Thanks, I'll fix those and do a full round a doc / comment proofreading.

In the tests, you check that REINDEX (OUTDATED) doesn't do anything crazy, but you are not really testing the functionality so far as I can see, as you don't have any tests which cause the collation to be outdated. Am I right about that? I wonder if you could modify DefineCollation. In addition to the providers "icu" and "libc" that it currently accepts, I wonder if it might accept "test" or similar, and then you could create a test in src/test/modules that compiles a "test" provider, creates a database with indexes dependent on something from that provider, stops the database, updates the test collation, ...?

Indeed the tests in create_index.sql (and similarly in 090_reindexdb.pl) check
that REINDEX (OUTDATED) will ignore non outdated indexes as expected.

But there are also the tests in collate.icu.utf8.out which will fake outdated
collations (that's the original tests for the collation tracking patches) and
then check that outdated indexes are reindexed with both REINDEX and REINDEX
(OUDATED).

So I think that all cases are covered. Do you want to have more test cases?

#32

Mark Dilger

mark.dilger@enterprisedb.com

almost 5 years ago

In reply to: Julien Rouhaud (#31)

Re: REINDEX backend filtering

On Mar 15, 2021, at 9:52 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

But there are also the tests in collate.icu.utf8.out which will fake outdated
collations (that's the original tests for the collation tracking patches) and
then check that outdated indexes are reindexed with both REINDEX and REINDEX
(OUDATED).

So I think that all cases are covered. Do you want to have more test cases?

I thought that just checked cases where a bogus 'not a version' was put into pg_catalog.pg_depend. I'm talking about having a collation provider who returns a different version string and has genuinely different collation rules between versions, thereby breaking the index until it is updated. Is that being tested?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#33

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Mark Dilger (#32)

Re: REINDEX backend filtering

On Mon, Mar 15, 2021 at 10:13:55AM -0700, Mark Dilger wrote:

On Mar 15, 2021, at 9:52 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

But there are also the tests in collate.icu.utf8.out which will fake outdated
collations (that's the original tests for the collation tracking patches) and
then check that outdated indexes are reindexed with both REINDEX and REINDEX
(OUDATED).

So I think that all cases are covered. Do you want to have more test cases?

I thought that just checked cases where a bogus 'not a version' was put into pg_catalog.pg_depend. I'm talking about having a collation provider who returns a different version string and has genuinely different collation rules between versions, thereby breaking the index until it is updated. Is that being tested?

No, we're only checking that the infrastructure works as intended.

Are you saying that you want to implement a simplistic collation provider with
"tunable" ordering, so that you can actually check that an ordering change will
be detected as a corrupted index, as in you'll get some error or incorrect
results?

I don't think that this infrastructure is the right place to do that, and I'm
not sure what would be the benefit here. If a library was updated, the
underlying indexes may or may not be corrupted, and we only warn about the
discrepancy with a low overhead.

#34

Mark Dilger

mark.dilger@enterprisedb.com

almost 5 years ago

In reply to: Julien Rouhaud (#33)

Re: REINDEX backend filtering

On Mar 15, 2021, at 10:34 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 15, 2021 at 10:13:55AM -0700, Mark Dilger wrote:

On Mar 15, 2021, at 9:52 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

But there are also the tests in collate.icu.utf8.out which will fake outdated
collations (that's the original tests for the collation tracking patches) and
then check that outdated indexes are reindexed with both REINDEX and REINDEX
(OUDATED).

So I think that all cases are covered. Do you want to have more test cases?

I thought that just checked cases where a bogus 'not a version' was put into pg_catalog.pg_depend. I'm talking about having a collation provider who returns a different version string and has genuinely different collation rules between versions, thereby breaking the index until it is updated. Is that being tested?

No, we're only checking that the infrastructure works as intended.

Are you saying that you want to implement a simplistic collation provider with
"tunable" ordering, so that you can actually check that an ordering change will
be detected as a corrupted index, as in you'll get some error or incorrect
results?

I'm saying that your patch seems to call down to get_collation_actual_version() via get_collation_version_for_oid() from your new function do_check_index_has_outdated_collation(), but I'm not seeing how that gets exercised.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#35

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Mark Dilger (#34)

Re: REINDEX backend filtering

On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:

I'm saying that your patch seems to call down to get_collation_actual_version() via get_collation_version_for_oid() from your new function do_check_index_has_outdated_collation(), but I'm not seeing how that gets exercised.

It's a little bit late here so sorry if I'm missing something.

do_check_index_has_outdated_collation() is called from
index_has_outdated_collation() which is called from
index_has_outdated_dependency() which is called from
RelationGetIndexListFiltered(), and that function is called when passing the
OUTDATED option to REINDEX (and reindexdb --outdated). So this is exercised
with added tests for both matching and non matching collation version.

#36

Mark Dilger

mark.dilger@enterprisedb.com

almost 5 years ago

In reply to: Julien Rouhaud (#35)

Re: REINDEX backend filtering

On Mar 15, 2021, at 10:50 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:

I'm saying that your patch seems to call down to get_collation_actual_version() via get_collation_version_for_oid() from your new function do_check_index_has_outdated_collation(), but I'm not seeing how that gets exercised.

It's a little bit late here so sorry if I'm missing something.

do_check_index_has_outdated_collation() is called from
index_has_outdated_collation() which is called from
index_has_outdated_dependency() which is called from
RelationGetIndexListFiltered(), and that function is called when passing the
OUTDATED option to REINDEX (and reindexdb --outdated). So this is exercised
with added tests for both matching and non matching collation version.

Ok, fair enough. I was thinking about the case where the collation actually returns a different version number because it (the C library providing the collation) got updated, but I think you've answered already that you are not planning to test that case, only the case where pg_depend is modified to have a bogus version number.

It seems a bit odd to me that a feature intended to handle cases where collations are updated is not tested via having a collation be updated during the test. It leaves open the possibility that something differs between the test and reindexed run after real world collation updates. But that's for the committer who picks up your patch to decide, and perhaps it is unfair to make your patch depend on addressing that issue.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#37

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Mark Dilger (#36)

Re: REINDEX backend filtering

On Mon, Mar 15, 2021 at 10:56:50AM -0700, Mark Dilger wrote:

On Mar 15, 2021, at 10:50 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:

I'm saying that your patch seems to call down to get_collation_actual_version() via get_collation_version_for_oid() from your new function do_check_index_has_outdated_collation(), but I'm not seeing how that gets exercised.

It's a little bit late here so sorry if I'm missing something.

do_check_index_has_outdated_collation() is called from
index_has_outdated_collation() which is called from
index_has_outdated_dependency() which is called from
RelationGetIndexListFiltered(), and that function is called when passing the
OUTDATED option to REINDEX (and reindexdb --outdated). So this is exercised
with added tests for both matching and non matching collation version.

Ok, fair enough. I was thinking about the case where the collation actually returns a different version number because it (the C library providing the collation) got updated, but I think you've answered already that you are not planning to test that case, only the case where pg_depend is modified to have a bogus version number.

This infrastructure is supposed to detect that the collation library *used to*
return a different version before it was updated. And that's exactly what
we're testing by manually updating the refobjversion.

It seems a bit odd to me that a feature intended to handle cases where collations are updated is not tested via having a collation be updated during the test. It leaves open the possibility that something differs between the test and reindexed run after real world collation updates. But that's for the committer who picks up your patch to decide, and perhaps it is unfair to make your patch depend on addressing that issue.

Why is that odd? We're testing that we're correctly storing the collation
version during index creating and correctly detecting a mismatch. Having a
fake collation provider to return a fake version number won't add any more
coverage unless I'm missing something.

It's similar to how we test the various corruption scenario. AFAIK we're not
providing custom drivers to write corrupted data but we're simply simulating a
corruption overwriting some blocks.

#38

Mark Dilger

mark.dilger@enterprisedb.com

almost 5 years ago

In reply to: Julien Rouhaud (#37)

Re: REINDEX backend filtering

On Mar 15, 2021, at 11:10 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 15, 2021 at 10:56:50AM -0700, Mark Dilger wrote:

On Mar 15, 2021, at 10:50 AM, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Mon, Mar 15, 2021 at 10:40:25AM -0700, Mark Dilger wrote:

I'm saying that your patch seems to call down to get_collation_actual_version() via get_collation_version_for_oid() from your new function do_check_index_has_outdated_collation(), but I'm not seeing how that gets exercised.

It's a little bit late here so sorry if I'm missing something.

do_check_index_has_outdated_collation() is called from
index_has_outdated_collation() which is called from
index_has_outdated_dependency() which is called from
RelationGetIndexListFiltered(), and that function is called when passing the
OUTDATED option to REINDEX (and reindexdb --outdated). So this is exercised
with added tests for both matching and non matching collation version.

Ok, fair enough. I was thinking about the case where the collation actually returns a different version number because it (the C library providing the collation) got updated, but I think you've answered already that you are not planning to test that case, only the case where pg_depend is modified to have a bogus version number.

This infrastructure is supposed to detect that the collation library *used to*
return a different version before it was updated. And that's exactly what
we're testing by manually updating the refobjversion.

It seems a bit odd to me that a feature intended to handle cases where collations are updated is not tested via having a collation be updated during the test. It leaves open the possibility that something differs between the test and reindexed run after real world collation updates. But that's for the committer who picks up your patch to decide, and perhaps it is unfair to make your patch depend on addressing that issue.

Why is that odd? We're testing that we're correctly storing the collation
version during index creating and correctly detecting a mismatch. Having a
fake collation provider to return a fake version number won't add any more
coverage unless I'm missing something.

It's similar to how we test the various corruption scenario. AFAIK we're not
providing custom drivers to write corrupted data but we're simply simulating a
corruption overwriting some blocks.

We do test corrupt relations. We intentionally corrupt the pages within corrupted heap tables to check that they get reported as corrupt. (See src/bin/pg_amcheck/t/004_verify_heapam.pl) Admittedly, the corruptions used in the tests are not necessarily representative of corruptions that might occur in the wild, but that is a hard problem to solve, since we don't know the statistical distribution of corruptions in the wild.

If you had a real, not fake, collation provider which actually provided a collation with an actual version number, stopped the server, changed the behavior of the collation as well as its version number, started the server, and ran REINDEX (OUTDATED), I think that would be a more real-world test. I'm not demanding that you write such a test. I'm just saying that it is strange that we don't have coverage for this anywhere, and was asking if you think there is such coverage, because, you know, maybe I just didn't see where that test was lurking.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#39

Mark Dilger

mark.dilger@enterprisedb.com

almost 5 years ago

In reply to: Mark Dilger (#38)

Re: REINDEX backend filtering

On Mar 15, 2021, at 11:32 AM, Mark Dilger <mark.dilger@enterprisedb.com> wrote:

If you had a real, not fake, collation provider which actually provided a collation with an actual version number, stopped the server, changed the behavior of the collation as well as its version number, started the server, and ran REINDEX (OUTDATED), I think that would be a more real-world test. I'm not demanding that you write such a test. I'm just saying that it is strange that we don't have coverage for this anywhere, and was asking if you think there is such coverage, because, you know, maybe I just didn't see where that test was lurking.

I should add some context regarding why I mentioned this issue at all.

Not long ago, if an upgrade of icu or libc broke your collations, you were sad. But postgres didn't claim to be competent to deal with this problem, so it was just a missing feature. Now, with REINDEX (OUTDATED), we're really implying, if not outright saying, that postgres knows how to deal with collation upgrades. I feel uncomfortable that v14 will make such a claim with not a single regression test confirming such a claim. I'm happy to discover that such a test is lurking somewhere and I just didn't see it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#40

Julien Rouhaud

rjuju123@gmail.com

almost 5 years ago

In reply to: Mark Dilger (#38)

Re: REINDEX backend filtering

On Tue, Mar 16, 2021 at 2:32 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:

We do test corrupt relations. We intentionally corrupt the pages within corrupted heap tables to check that they get reported as corrupt. (See src/bin/pg_amcheck/t/004_verify_heapam.pl)

I disagree. You're testing a modified version of the pages in OS
cache, which is very likely to be different from real world
corruption. Those usually end up with a discrepancy between storage
and OS cache and this scenario isn't tested nor documented.