Progress report of CREATE INDEX for nested partitioned tables

Started by Ilya Gladyshevabout 3 years ago47 messages
#1Ilya Gladyshev
ilya.v.gladyshev@gmail.com
1 attachment(s)

Hi,

I have noticed that progress reporting for CREATE INDEX of partitioned
tables seems to be working poorly for nested partitioned tables. In
particular, it overwrites total and done partitions count when it
recurses down to child partitioned tables and it only reports top-level
partitions. So it's not hard to see something like this during CREATE
INDEX now:

postgres=# select partitions_total, partitions_done from
pg_stat_progress_create_index ;
partitions_total | partitions_done
------------------+-----------------
1 | 2
(1 row)

I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

Thanks,
Ilya Gladyshev

Attachments:

0001-report-top-parent-progress-for-CREATE-INDEX.patchtext/x-patch; charset=UTF-8; name=0001-report-top-parent-progress-for-CREATE-INDEX.patchDownload
From 342caa3d4ce273b14a6998c20c32b862d2d4c890 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Fri, 9 Dec 2022 23:17:29 +0400
Subject: [PATCH] report top parent progress for CREATE INDEX

---
 src/backend/commands/indexcmds.c | 36 ++++++++++++++++++++++++++------
 1 file changed, 30 insertions(+), 6 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3ab..80557dad8d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -129,6 +129,12 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+/*
+ * Used to report the number of partitions,
+ * that were processed during index creation.
+ */
+static int create_index_parts_done;
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1218,8 +1224,18 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				List *inhs;
+
+				/* Reset counter of done partitions when processing root index */
+				create_index_parts_done = 0;
+				inhs = find_all_inheritors(relationId, NoLock, NULL);
+				/* inheritance tree size without root itself */
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 list_length(inhs) - 1);
+				list_free(inhs);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,8 +1447,6 @@ DefineIndex(Oid relationId,
 										   child_save_sec_context);
 				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1465,13 +1479,17 @@ DefineIndex(Oid relationId,
 
 		/*
 		 * Indexes on partitioned tables are not themselves built, so we're
-		 * done here.
+		 * done here. If it is a child partitioned table, increment done parts counter.
 		 */
 		AtEOXact_GUC(false, root_save_nestlevel);
 		SetUserIdAndSecContext(root_save_userid, root_save_sec_context);
 		table_close(rel, NoLock);
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+										 ++create_index_parts_done);
+
 		return address;
 	}
 
@@ -1483,9 +1501,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done, otherwise, increment
+		 * done partition counter.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+										 ++create_index_parts_done);
 
 		return address;
 	}
-- 
2.30.2

#2Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#1)
1 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sat, Dec 10, 2022 at 12:18:32PM +0400, Ilya Gladyshev wrote:

Hi,

I have noticed that progress reporting for CREATE INDEX of partitioned
tables seems to be working poorly for nested partitioned tables. In
particular, it overwrites total and done partitions count when it
recurses down to child partitioned tables and it only reports top-level
partitions.�So it's not hard to see something like this during CREATE
INDEX now:

postgres=# select partitions_total, partitions_done from
pg_stat_progress_create_index ;
partitions_total | partitions_done
------------------+-----------------
1 | 2
(1 row)

Yeah. I didn't verify, but it looks like this is a bug going to back to
v12. As you said, when called recursively, DefineIndex() clobbers the
number of completed partitions.

Maybe DefineIndex() could flatten the list of partitions. But I don't
think that can work easily with iteration rather than recursion.

Could you check what I've written as a counter-proposal ?

As long as we're changing partitions_done to include nested
sub-partitions, it seems to me like we should exclude intermediate
"catalog-only" partitioned indexes, and count only physical leaf
partitions. Should it alo exclude any children with matching indexes,
which will also be catalog-only changes? Probably not.

The docs say:
|When creating an index on a partitioned table, this column is set to the
|total number of partitions on which the index is to be created. This
|field is 0 during a REINDEX.

I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

I wrote a bunch of assertions for this, which seems to have uncovered an
similar issue with COPY progress reporting, dating to 8a4f618e7. I'm
not sure the assertions are okay. I imagine they may break other
extensions, as with file_fdw.

--
Justin

Attachments:

0001-fix-progress-reporting-of-nested-partitioned-indexes.patchtext/x-diff; charset=us-asciiDownload
From b8077babf9a101f9d1bf41dd1ad866d2ea38b603 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 10 Dec 2022 16:17:50 -0600
Subject: [PATCH] fix progress reporting of nested, partitioned indexes..

---
 src/backend/commands/indexcmds.c              | 51 +++++++++++--
 src/backend/utils/activity/backend_progress.c | 72 +++++++++++++++++++
 2 files changed, 118 insertions(+), 5 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3abf..6caa1f94ed7 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -482,6 +482,26 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
 	}
 }
 
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+num_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	return nleaves;
+}
 
 /*
  * DefineIndex
@@ -1212,14 +1232,22 @@ DefineIndex(Oid relationId,
 		partdesc = RelationGetPartitionDesc(rel, true);
 		if ((!stmt->relation || stmt->relation->inh) && partdesc->nparts > 0)
 		{
-			int			nparts = partdesc->nparts;
+			int			nparts = partdesc->nparts; /* only direct children */
+			int			nparts_done = 0; /* direct and indirect children */
 			Oid		   *part_oids = palloc_array(Oid, nparts);
 			bool		invalidate_parent = false;
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				/*
+				 * Report the number of leaf partitions (excluding foreign
+				 * tables), not just direct children.
+				 */
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 num_leaf_partitions(relationId));
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,8 +1459,21 @@ DefineIndex(Oid relationId,
 										   child_save_sec_context);
 				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
+				if (!OidIsValid(parentIndexId))
+				{
+					/*
+					 * Report progress only when processing top-level indexes.
+					 * When recursing, the called functions wouldn't know the
+					 * current number of partitions which were processed.
+					 * After recursing, increment by the number of children.
+					 * This works also for physical/leaf partitions.
+					 */
+					nparts_done += num_leaf_partitions(childRelid);
+
+					pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+												 nparts_done);
+				}
+
 				free_attrmap(attmap);
 			}
 
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index f29199725b7..1db1332ebe6 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,75 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64		*a = beentry->st_progress_param;
+
+	switch (beentry->st_progress_command)
+	{
+	case PROGRESS_COMMAND_VACUUM:
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] ==
+				a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] ==
+				a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+		break;
+
+	case PROGRESS_COMMAND_ANALYZE:
+		Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] ==
+				a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+		Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+		Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] ==
+				a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+		break;
+
+	case PROGRESS_COMMAND_CLUSTER:
+		Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] ==
+				a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+		break;
+
+	case PROGRESS_COMMAND_CREATE_INDEX:
+		Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+		Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] ==
+				a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+		break;
+
+	case PROGRESS_COMMAND_BASEBACKUP:
+		/* progress is optional */
+		if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+		{
+			Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] ==
+					a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+			Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] ==
+					a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+		}
+		break;
+
+	case PROGRESS_COMMAND_COPY:
+#if 0
+// This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+// to support simultaneous copy commands, as happens during JOIN.
+		/* bytes progress is not available in all cases */
+		if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+			// Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+							a[PROGRESS_COPY_BYTES_PROCESSED],
+							a[PROGRESS_COPY_BYTES_TOTAL]);
+
+		break;
+#endif
+
+	case PROGRESS_COMMAND_INVALID:
+		break; /* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -105,6 +175,8 @@ pgstat_progress_end_command(void)
 	if (beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
 		return;
 
+	pgstat_progress_asserts();
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
 	beentry->st_progress_command_target = InvalidOid;
-- 
2.25.1

#3Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#2)
Re: Progress report of CREATE INDEX for nested partitioned tables

Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives us the opportunity to improve the granularity later without any surprising changes for the end user. We could use this patch for previous versions and make more granular output in the latest. What do you think?

As long as we're changing partitions_done to include nested
sub-partitions, it seems to me like we should exclude intermediate
"catalog-only" partitioned indexes, and count only physical leaf
partitions. Should it alo exclude any children with matching indexes,
which will also be catalog-only changes? Probably not.

The docs say:
|When creating an index on a partitioned table, this column is set to the
|total number of partitions on which the index is to be created. This
|field is 0 during a REINDEX.

I agree with you on catalog-only partitioned indexes, but I think that in the perfect world we should exclude all the relations where the index isn’t actually created, so that means excluding attached indexes as well. However, IMO doing it this way will require too much of a code rewrite for quite a minor feature (but we could do it, ofc). I actually think that the progress view would be better off without the total number of partitions, but I’m not sure we have this option now. With this in mind, I think your proposal to exclude catalog-only indexes sounds reasonable to me, but I feel like the docs are off in this case, because the attached indexes are not created, but we pretend like they are in this metric, so we should fix one or the other.

Show quoted text

I changed current behaviour to report the total number of partitions in
the inheritance tree and fixed recursion in the attached patch. I used
a static variable to keep the counter to avoid ABI breakage of
DefineIndex, so that we could backpatch this to previous versions.

I wrote a bunch of assertions for this, which seems to have uncovered an
similar issue with COPY progress reporting, dating to 8a4f618e7. I'm
not sure the assertions are okay. I imagine they may break other
extensions, as with file_fdw.

--
Justin
<0001-fix-progress-reporting-of-nested-partitioned-indexes.patch>

#4Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#3)
2 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:

Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives us the opportunity to improve the granularity later without any surprising changes for the end user. We could use this patch for previous versions and make more granular output in the latest. What do you think?

Somehow, it hadn't occured to me that my patch "lost granularity" by
incrementing the progress bar by more than one... Shoot.

I actually think that the progress view would be better off without the total number of partitions,

Just curious - why ?

With this in mind, I think your proposal to exclude catalog-only indexes sounds reasonable to me, but I feel like the docs are off in this case, because the attached indexes are not created, but we pretend like they are in this metric, so we should fix one or the other.

I agree that the docs should indicate whether we're counting "all
partitions", "direct partitions", and whether or not that includes
partitioned partitions, or just leaf partitions.

I have another proposal: since the original patch 3.5 years ago didn't
consider or account for sub-partitions, let's not start counting them
now. It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

Alternately, if it's okay to add nparts_done to the IndexStmt, then
that's easy.

--
Justin

Attachments:

0001-report-top-parent-progress-for-CREATE-INDEX.patchtext/x-diff; charset=us-asciiDownload
From 2e93bd37ca3c8add1f8e3e44a9f3906c332b83f2 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Fri, 9 Dec 2022 23:17:29 +0400
Subject: [PATCH] report top parent progress for CREATE INDEX

! add asserts, avoid global var
! do not count intermediate/nested/sub-partitions
---
 src/backend/commands/indexcmds.c              | 30 ++++++-
 src/backend/utils/activity/backend_progress.c | 79 +++++++++++++++++++
 src/include/nodes/parsenodes.h                |  2 +
 3 files changed, 107 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3abf..6e6bba9d3a9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1218,8 +1218,28 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int			nleaves = 0;
+				List		*childs;
+				ListCell	*lc;
+
+				/*
+				 * Count the number of physical children, excluding foreign
+				 * tables, intermediate partitioned tables, as well as the
+				 * partitioned index itself.
+				 */
+				childs = find_all_inheritors(relationId, NoLock, NULL);
+				foreach(lc, childs)
+				{
+					Oid		partrelid = lfirst_oid(lc);
+					if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+						nleaves++;
+				}
+
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 nleaves);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,8 +1451,10 @@ DefineIndex(Oid relationId,
 										   child_save_sec_context);
 				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
+				if (RELKIND_HAS_STORAGE(get_rel_relkind(childRelid)))
+					pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+												 ++stmt->nparts_done);
+
 				free_attrmap(attmap);
 			}
 
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index f29199725b7..c661ad94782 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,82 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64		*a = beentry->st_progress_param;
+
+	/*
+	 * If the command fails due to interrupt or error, the values may be
+	 * less than rather than equal to expected, final value.
+	 */
+
+	switch (beentry->st_progress_command)
+	{
+	case PROGRESS_COMMAND_VACUUM:
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+				a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+				a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+		break;
+
+	case PROGRESS_COMMAND_ANALYZE:
+		Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+				a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+		/* Extended stats can be skipped */
+		Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+		Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <=
+				a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+		break;
+
+	case PROGRESS_COMMAND_CLUSTER:
+		Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <=
+				a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+		break;
+
+	case PROGRESS_COMMAND_CREATE_INDEX:
+		Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+		/* The command can fail before finishing indexes */
+		Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] <=
+				a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+		break;
+
+	case PROGRESS_COMMAND_BASEBACKUP:
+		/* progress reporting is optional for these */
+		if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+		{
+			Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] <=
+					a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+			Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] <=
+					a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+		}
+		break;
+
+	case PROGRESS_COMMAND_COPY:
+#if 0
+// This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+// to support simultaneous copy commands, as happens during JOIN.
+		/* bytes progress is not available in all cases */
+		if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+			// Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+							a[PROGRESS_COPY_BYTES_PROCESSED],
+							a[PROGRESS_COPY_BYTES_TOTAL]);
+
+		break;
+#endif
+
+	case PROGRESS_COMMAND_INVALID:
+		break; /* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -105,6 +182,8 @@ pgstat_progress_end_command(void)
 	if (beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
 		return;
 
+	pgstat_progress_asserts();
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
 	beentry->st_progress_command_target = InvalidOid;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index bebb9620b27..e07527d80f2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3015,6 +3015,8 @@ typedef struct IndexStmt
 	bool		if_not_exists;	/* just do nothing if index already exists? */
 	bool		reset_default_tblspc;	/* reset default_tablespace prior to
 										 * executing */
+	int			nparts_done;	/* number of partitions processed during
+								 * index creation */
 } IndexStmt;
 
 /* ----------------------
-- 
2.25.1

0001-report-top-parent-progress-for-CREATE-INDEX.txttext/x-diff; charset=us-asciiDownload
From cc6237f4f8e32c63611d37839758b1a30a174c69 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Fri, 9 Dec 2022 23:17:29 +0400
Subject: [PATCH] report top parent progress for CREATE INDEX

---
 doc/src/sgml/monitoring.sgml                  |  2 +
 src/backend/commands/indexcmds.c              | 17 +++-
 src/backend/utils/activity/backend_progress.c | 79 +++++++++++++++++++
 3 files changed, 94 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 11a8ebe5ec7..c2389477356 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6536,6 +6536,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       <para>
        When creating an index on a partitioned table, this column is set to
        the total number of partitions on which the index is to be created.
+       This includes direct children, but not indirect children.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6547,6 +6548,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       <para>
        When creating an index on a partitioned table, this column is set to
        the number of partitions on which the index has been created.
+       This includes direct children, but not indirect children.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3abf..e97ba963c37 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -1218,8 +1218,12 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				/* Set the number of partitions to the number of *direct* children */
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 nparts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1284,6 +1288,9 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 					table_close(childrel, lockmode);
+					if (!OidIsValid(parentIndexId))
+						pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+													 i + 1);
 					continue;
 				}
 
@@ -1431,8 +1438,10 @@ DefineIndex(Oid relationId,
 										   child_save_sec_context);
 				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
+				if (!OidIsValid(parentIndexId))
+					pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+												 i + 1);
+
 				free_attrmap(attmap);
 			}
 
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index f29199725b7..c661ad94782 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,82 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64		*a = beentry->st_progress_param;
+
+	/*
+	 * If the command fails due to interrupt or error, the values may be
+	 * less than rather than equal to expected, final value.
+	 */
+
+	switch (beentry->st_progress_command)
+	{
+	case PROGRESS_COMMAND_VACUUM:
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+				a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+				a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+		Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+		break;
+
+	case PROGRESS_COMMAND_ANALYZE:
+		Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+				a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+		/* Extended stats can be skipped */
+		Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+		Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <=
+				a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+		break;
+
+	case PROGRESS_COMMAND_CLUSTER:
+		Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <=
+				a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+		break;
+
+	case PROGRESS_COMMAND_CREATE_INDEX:
+		Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+		/* The command can fail before finishing indexes */
+		Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] <=
+				a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+		break;
+
+	case PROGRESS_COMMAND_BASEBACKUP:
+		/* progress reporting is optional for these */
+		if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+		{
+			Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] <=
+					a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+			Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] <=
+					a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+		}
+		break;
+
+	case PROGRESS_COMMAND_COPY:
+#if 0
+// This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+// to support simultaneous copy commands, as happens during JOIN.
+		/* bytes progress is not available in all cases */
+		if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+			// Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+							a[PROGRESS_COPY_BYTES_PROCESSED],
+							a[PROGRESS_COPY_BYTES_TOTAL]);
+
+		break;
+#endif
+
+	case PROGRESS_COMMAND_INVALID:
+		break; /* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -105,6 +182,8 @@ pgstat_progress_end_command(void)
 	if (beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
 		return;
 
+	pgstat_progress_asserts();
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
 	beentry->st_progress_command_target = InvalidOid;
-- 
2.25.1

#5Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#4)
2 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Mon, 2022-12-12 at 22:43 -0600, Justin Pryzby wrote:

On Mon, Dec 12, 2022 at 11:39:23PM +0400, Ilya Gladyshev wrote:

Could you check what I've written as a counter-proposal ?

I think that this might be a good solution to start with, it gives
us the opportunity to improve the granularity later without any
surprising changes for the end user. We could use this patch for
previous versions and make more granular output in the latest. What
do you think?

Somehow, it hadn't occured to me that my patch "lost granularity" by
incrementing the progress bar by more than one...  Shoot.

I actually think that the progress view would be better off without
the total number of partitions,

Just curious - why ?

We don't really know how many indexes we are going to create, unless we
have some kind of preliminary "planning" stage where we acumulate all
the relations that will need to have indexes created (rather than
attached). And if someone wants the total, it can be calculated
manually without this view, it's less user-friendly, but if we can't do
it well, I would leave it up to the user.

With this in mind, I think your proposal to exclude catalog-only
indexes sounds reasonable to me, but I feel like the docs are off
in this case, because the attached indexes are not created, but we
pretend like they are in this metric, so we should fix one or the
other.

I agree that the docs should indicate whether we're counting "all
partitions", "direct partitions", and whether or not that includes
partitioned partitions, or just leaf partitions.

Agree. I think that docs should also be explicit about the attached
indexes, if we decide to count them in as "created".

I have another proposal: since the original patch 3.5 years ago
didn't
consider or account for sub-partitions, let's not start counting them
now.  It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

I have had this thought initially, but then I thought that it's not
what I would want, if I was to track progress of multi-level
partitioned tables (but yeah, I guess it's pretty uncommon). In this
respect, I like your initial counter-proposal more, because it leaves
us room to improve this in the future. Otherwise, if we commit to
reporting only top-level partitions now, I'm not sure we will have the
opportunity to change this.

Alternately, if it's okay to add nparts_done to the IndexStmt, then
that's easy.

Yeah, or we could add another argument to DefineIndex. I don't know if
it's ok, or which option is better here in terms of compatibility and
interface-wise, so I have tried both of them, see the attached patches.

Attachments:

0001-parts_done-via-DefineIndex-args.patch.txttext/plain; charset=UTF-8; name=0001-parts_done-via-DefineIndex-args.patch.txtDownload
From 4f55526f8a534d6b2f27b6b17cadaee0370e4cbc Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 13 Dec 2022 19:42:53 +0400
Subject: [PATCH] parts_done via DefineIndex args

---
 src/backend/commands/indexcmds.c | 52 +++++++++++++++++++++++++++-----
 src/backend/commands/tablecmds.c |  7 +++--
 src/backend/tcop/utility.c       |  3 +-
 src/include/commands/defrem.h    |  3 +-
 4 files changed, 53 insertions(+), 12 deletions(-)

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3ab..9fa9109b9e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -129,6 +129,29 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+num_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -530,7 +553,8 @@ DefineIndex(Oid relationId,
 			bool check_rights,
 			bool check_not_in_use,
 			bool skip_build,
-			bool quiet)
+			bool quiet,
+			int *parts_done)
 {
 	bool		concurrent;
 	char	   *indexRelationName;
@@ -568,6 +592,7 @@ DefineIndex(Oid relationId,
 	Oid			root_save_userid;
 	int			root_save_sec_context;
 	int			root_save_nestlevel;
+	int root_parts_done;
 
 	root_save_nestlevel = NewGUCNestLevel();
 
@@ -1218,8 +1243,17 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				/* Init counter of done partitions when processing root index */
+				root_parts_done = 0;
+				parts_done = &root_parts_done;
+				total_parts = num_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1426,13 +1460,11 @@ DefineIndex(Oid relationId,
 								indexRelationId,	/* this is our child */
 								createdConstraintId,
 								is_alter_table, check_rights, check_not_in_use,
-								skip_build, quiet);
+								skip_build, quiet, parts_done);
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1483,9 +1515,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done, otherwise, increment
+		 * done partition counter.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else if (parts_done != NULL)
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+										 ++(*parts_done));
 
 		return address;
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0b352a5fff..dde57faf44 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1216,7 +1216,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 						InvalidOid,
 						RelationGetRelid(idxRel),
 						constraintOid,
-						false, false, false, false, false);
+						false, false, false, false, false, NULL);
 
 			index_close(idxRel, AccessShareLock);
 		}
@@ -8577,7 +8577,8 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 						  check_rights,
 						  false,	/* check_not_in_use - we did it already */
 						  skip_build,
-						  quiet);
+						  quiet,
+						  NULL);
 
 	/*
 	 * If TryReuseIndex() stashed a relfilenumber for us, we used it for the
@@ -18076,7 +18077,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 			DefineIndex(RelationGetRelid(attachrel), stmt, InvalidOid,
 						RelationGetRelid(idxRel),
 						conOid,
-						true, false, false, false, false);
+						true, false, false, false, false, NULL);
 		}
 
 		index_close(idxRel, AccessShareLock);
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 247d0816ad..daefd726cc 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1549,7 +1549,8 @@ ProcessUtilitySlow(ParseState *pstate,
 									true,	/* check_rights */
 									true,	/* check_not_in_use */
 									false,	/* skip_build */
-									false); /* quiet */
+									false, /* quiet */
+									NULL);
 
 					/*
 					 * Add the CREATE INDEX node itself to stash right away;
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 1d3ce246c9..b6eb8564c1 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -33,7 +33,8 @@ extern ObjectAddress DefineIndex(Oid relationId,
 								 bool check_rights,
 								 bool check_not_in_use,
 								 bool skip_build,
-								 bool quiet);
+								 bool quiet,
+								 int *parts_done);
 extern void ExecReindex(ParseState *pstate, ReindexStmt *stmt, bool isTopLevel);
 extern char *makeObjectName(const char *name1, const char *name2,
 							const char *label);
-- 
2.30.2

0001-parts_done-via-IndexStmt.patchtext/x-patch; charset=UTF-8; name=0001-parts_done-via-IndexStmt.patchDownload
From 8d0798ece406ac88ea0b1dfdfa91aa234d99c560 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 13 Dec 2022 22:04:45 +0400
Subject: [PATCH] parts_done via IndexStmt

---
 src/backend/bootstrap/bootparse.y  |  2 ++
 src/backend/commands/indexcmds.c   | 48 ++++++++++++++++++++++++++----
 src/backend/parser/parse_utilcmd.c |  2 ++
 src/include/nodes/parsenodes.h     |  1 +
 4 files changed, 48 insertions(+), 5 deletions(-)

diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index e6d62d81c1..6be3cc2dcf 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -296,6 +296,7 @@ Boot_DeclareIndexStmt:
 					stmt->concurrent = false;
 					stmt->if_not_exists = false;
 					stmt->reset_default_tblspc = false;
+					stmt->parts_done = -1;
 
 					/* locks and races need not concern us in bootstrap mode */
 					relationId = RangeVarGetRelid(stmt->relation, NoLock,
@@ -348,6 +349,7 @@ Boot_DeclareUniqueIndexStmt:
 					stmt->concurrent = false;
 					stmt->if_not_exists = false;
 					stmt->reset_default_tblspc = false;
+					stmt->parts_done = -1;
 
 					/* locks and races need not concern us in bootstrap mode */
 					relationId = RangeVarGetRelid(stmt->relation, NoLock,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b5b860c3ab..1904a3ddd9 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -129,6 +129,29 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+num_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1218,8 +1241,16 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				/* Init counter of done partitions when processing root index */
+				stmt->parts_done = 0;
+				total_parts = num_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1429,10 +1460,11 @@ DefineIndex(Oid relationId,
 								skip_build, quiet);
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
+					/* childStmt has more recent data on done parts, update the parent */
+					stmt->parts_done = childStmt->parts_done;
 				}
+				/* TODO: Should we add num_leaf_partitions() for attached idxs? */
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1483,9 +1515,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done, otherwise, increment
+		 * done partition counter.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else if (stmt->parts_done >= 0)
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+										 ++stmt->parts_done);
 
 		return address;
 	}
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index f743cd548c..27335b0aba 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1599,6 +1599,7 @@ generateClonedIndexStmt(RangeVar *heapRel, Relation source_idx,
 	index->concurrent = false;
 	index->if_not_exists = false;
 	index->reset_default_tblspc = false;
+	index->parts_done = -1;
 
 	/*
 	 * We don't try to preserve the name of the source index; instead, just
@@ -2217,6 +2218,7 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
 	index->concurrent = false;
 	index->if_not_exists = false;
 	index->reset_default_tblspc = constraint->reset_default_tblspc;
+	index->parts_done = -1;
 
 	/*
 	 * If it's ALTER TABLE ADD CONSTRAINT USING INDEX, look up the index and
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index bebb9620b2..a8b69c387a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3015,6 +3015,7 @@ typedef struct IndexStmt
 	bool		if_not_exists;	/* just do nothing if index already exists? */
 	bool		reset_default_tblspc;	/* reset default_tablespace prior to
 										 * executing */
+	int parts_done;
 } IndexStmt;
 
 /* ----------------------
-- 
2.30.2

#6Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#5)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Tue, Dec 13, 2022 at 10:18:58PM +0400, Ilya Gladyshev wrote:

I actually think that the progress view would be better off without
the total number of partitions,

Just curious - why ?

We don't really know how many indexes we are going to create, unless we
have some kind of preliminary "planning" stage where we acumulate all
the relations that will need to have indexes created (rather than
attached). And if someone wants the total, it can be calculated
manually without this view, it's less user-friendly, but if we can't do
it well, I would leave it up to the user.

Thanks. One other reason is that the partitions (and sub-partitions)
may not be equally sized. Also, I've said before that it's weird to
report macroscopic progress about the number of partitions finihed in
the same place as reporting microscopic details like the number of
blocks done of the relation currently being processed.

I have another proposal: since the original patch 3.5 years ago
didn't
consider or account for sub-partitions, let's not start counting them
now.� It was never defined whether they were included or not (and I
guess that they're not common) so we can take this opportunity to
clarify the definition.

I have had this thought initially, but then I thought that it's not
what I would want, if I was to track progress of multi-level
partitioned tables (but yeah, I guess it's pretty uncommon). In this
respect, I like your initial counter-proposal more, because it leaves
us room to improve this in the future. Otherwise, if we commit to
reporting only top-level partitions now, I'm not sure we will have the
opportunity to change this.

We have the common problem of too many patches.

/messages/by-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value being
lost to the caller.

/messages/by-id/20221211063334.GB27893@telsasoft.com
This also counts indirect children, but only increments the progress
reporting in the parent. This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter will
"jump" from (say) 20 to 30 without ever hitting 21-29.

/messages/by-id/20221213044331.GJ27893@telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
children. This is minimal, but discourages any future plan to track
progress involving intermediate partitions with finer granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows reporting
fine-grained progress involving intermediate partitions.

/messages/by-id/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children. The first patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch). And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

--
Justin

#7Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#6)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

We have the common problem of too many patches.

/messages/by-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value being
lost to the caller.

/messages/by-id/20221211063334.GB27893@telsasoft.com
This also counts indirect children, but only increments the progress
reporting in the parent. This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter will
"jump" from (say) 20 to 30 without ever hitting 21-29.

/messages/by-id/20221213044331.GJ27893@telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
children. This is minimal, but discourages any future plan to track
progress involving intermediate partitions with finer granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows reporting
fine-grained progress involving intermediate partitions.

/messages/by-id/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children. The first patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch). And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.

#8Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#7)
1 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

We have the common problem of too many patches.

/messages/by-id/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel@gmail.com
This changes the progress reporting to show indirect children as
"total", and adds a global variable to track recursion into
DefineIndex(), allowing it to be incremented without the value
being
lost to the caller.

/messages/by-id/20221211063334.GB27893@telsasoft.com
This also counts indirect children, but only increments the
progress
reporting in the parent.  This has the disadvantage that when
intermediate partitions are in use, the done_partitions counter
will
"jump" from (say) 20 to 30 without ever hitting 21-29.

/messages/by-id/20221213044331.GJ27893@telsasoft.com
This has two alternate patches:
- One patch changes to only update progress reporting of *direct*
  children.  This is minimal, but discourages any future plan to
track
  progress involving intermediate partitions with finer
granularity.
- A alternative patch adds IndexStmt.nparts_done, and allows
reporting
  fine-grained progress involving intermediate partitions.

/messages/by-id/039564d234fc3d014c555a7ee98be69a9e724836.camel@gmail.com
This also reports progress of intermediate children.  The first
patch
does it by adding an argument to DefineIndex() (which isn't okay to
backpatch).  And an alternate patch does it by adding to IndexStmt.

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.

I suggest that we move on with the IndexStmt patch and see what the
committers have to say about it. I have brushed the patch up a bit,
fixing TODOs and adding docs as per our discussion above.

Attachments:

v2-0001-parts_done-via-IndexStmt.patchtext/x-patch; charset=UTF-8; name=v2-0001-parts_done-via-IndexStmt.patchDownload
From 490d8afa7cb952e5b3947d81d85bf9690499e8a3 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 13 Dec 2022 22:04:45 +0400
Subject: [PATCH v2] parts_done via IndexStmt

---
 doc/src/sgml/monitoring.sgml       |  4 +--
 src/backend/bootstrap/bootparse.y  |  2 ++
 src/backend/commands/indexcmds.c   | 57 +++++++++++++++++++++++++++---
 src/backend/parser/parse_utilcmd.c |  2 ++
 src/include/nodes/parsenodes.h     |  1 +
 5 files changed, 59 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index cf220c3bcb..2c32a92364 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6565,7 +6565,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6576,7 +6576,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of leaf partitions on which the index has been created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598..dad9c38d90 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -296,6 +296,7 @@ Boot_DeclareIndexStmt:
 					stmt->concurrent = false;
 					stmt->if_not_exists = false;
 					stmt->reset_default_tblspc = false;
+					stmt->parts_done = -1;
 
 					/* locks and races need not concern us in bootstrap mode */
 					relationId = RangeVarGetRelid(stmt->relation, NoLock,
@@ -348,6 +349,7 @@ Boot_DeclareUniqueIndexStmt:
 					stmt->concurrent = false;
 					stmt->if_not_exists = false;
 					stmt->reset_default_tblspc = false;
+					stmt->parts_done = -1;
 
 					/* locks and races need not concern us in bootstrap mode */
 					relationId = RangeVarGetRelid(stmt->relation, NoLock,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 8afc006f89..be4d1c884e 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,29 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+num_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1242,16 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				/* Init counter of done partitions when processing root index */
+				stmt->parts_done = 0;
+				total_parts = num_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1430,10 +1461,20 @@ DefineIndex(Oid relationId,
 								skip_build, quiet);
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
+					/* childStmt has more recent data on done parts, update the parent */
+					stmt->parts_done = childStmt->parts_done;
+				}
+				else
+				{
+					/*
+					 * If the index has been attached, count all of its leaf indexes as attached,
+					 * because PROGRESS_CREATEIDX_PARTITIONS_TOTAL includes them all initially.
+					 */
+					stmt->parts_done += num_leaf_partitions(childRelid);
+					pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+												 stmt->parts_done);
 				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1525,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done, otherwise, increment
+		 * done partition counter, if the counter is initialized.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else if (stmt->parts_done >= 0)
+			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+										 ++stmt->parts_done);
 
 		return address;
 	}
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index bffa9f8dd0..1a286694da 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -1599,6 +1599,7 @@ generateClonedIndexStmt(RangeVar *heapRel, Relation source_idx,
 	index->concurrent = false;
 	index->if_not_exists = false;
 	index->reset_default_tblspc = false;
+	index->parts_done = -1;
 
 	/*
 	 * We don't try to preserve the name of the source index; instead, just
@@ -2217,6 +2218,7 @@ transformIndexConstraint(Constraint *constraint, CreateStmtContext *cxt)
 	index->concurrent = false;
 	index->if_not_exists = false;
 	index->reset_default_tblspc = constraint->reset_default_tblspc;
+	index->parts_done = -1;
 
 	/*
 	 * If it's ALTER TABLE ADD CONSTRAINT USING INDEX, look up the index and
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index cfeca96d53..2191d28e4a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3004,6 +3004,7 @@ typedef struct IndexStmt
 	bool		if_not_exists;	/* just do nothing if index already exists? */
 	bool		reset_default_tblspc;	/* reset default_tablespace prior to
 										 * executing */
+	int parts_done; /* counter of created/attached indexes for a partitioned index */
 } IndexStmt;
 
 /* ----------------------
-- 
2.30.2

#9Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Ilya Gladyshev (#8)
Re: Progress report of CREATE INDEX for nested partitioned tables

On 1/9/23 09:44, Ilya Gladyshev wrote:

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

...

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

The primary risk is old extensions (built on older minor version)
running on new server, getting confused by new fields (and implied
shifts in the structs). But fields at the end should be safe - the
extension simply ignores the stuff at the end. The one problem would be
arrays of structs, because even a field at the end changes the array
stride. But I don't think we do that with IndexStmt ...

Of course, if the "old" extension itself allocates the struct and passes
it to core code, that might still be an issue, because it'll allocate a
smaller struct, and core might see bogus data at the end.

On the other hand, new extensions on old server may get confused too,
because it may try setting a field that does not exist.

So ultimately it's about weighing risks vs. benefits - evaluating
whether fixing the issue is actually worth it.

The question is if/how many such extensions messing with IndexStmt in
this way actually exist. That is, allocate IndexStmt (or array of it). I
haven't found any, but maybe some extensions for index or partition
management do it? Not sure.

But ...

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

This should be resolved before the "CIC on partitioned tables" patch,
which I think is otherwise done.

I suggest that we move on with the IndexStmt patch and see what the
committers have to say about it. I have brushed the patch up a bit,
fixing TODOs and adding docs as per our discussion above.

I did take a look at the patch, so here are my 2c:

1) num_leaf_partitions says it's "excluding foreign tables" but then it
uses RELKIND_HAS_STORAGE() which excludes various others relkinds, e.g.
partitioned tables etc. Minor, but perhaps a bit confusing.

2) I'd probably say count_leaf_partitions() instead.

3) The new part in DefineIndex counting leaf partitions should have a
comment before

if (!OidIsValid(parentIndexId))
{ ... }

4) It's a bit confusing that one of the branches in DefineIndex just
sets stmt->parts_done without calling pgstat_progress_update_param
(while the other one does both). AFAICS the call is not needed because
we already updated it during the recursive DefineIndex call, but maybe
the comment should mention that?

As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

partitions_total partitions_done
10 9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

partitions_total partitions_done
20 10

which makes the view a bit useless for it's primary purpose, IMHO.

* I don't care very much about leaf vs. non-leaf partitions. If we
exclude non-leaf ones, fine with me. But the number of non-leaf ones
should be much smaller than leaf ones, and if the partition already has
a matching index that distorts the tracking too. Furthermore the
partitions may have different size etc. so the progress is only
approximate anyway.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10Justin Pryzby
pryzby@telsasoft.com
In reply to: Tomas Vondra (#9)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:

On 1/9/23 09:44, Ilya Gladyshev wrote:

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

...

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

My question isn't whether it's okay to add a field at the end of a
struct in general, but rather whether it's acceptable to add this field
at the end of this struct, and not because it's unsafe to do in a minor
release, but whether someone is going to say that it's an abuse of the
data structure.

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

Good idea to try.

As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

partitions_total partitions_done
10 9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

partitions_total partitions_done
20 10

which makes the view a bit useless for it's primary purpose, IMHO.

To be clear, that's the current, buggy behavior, and this thread is
about fixing it. The proposed patches all ought to avoid that.

But the bug isn't caused by not "calculating partitions in advance".
Rather, the issue is that currently, the "total" is overwritten while
recursing.

That's a separate question from whether indirect partitions are counted
or not.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.

Maybe, but it's out of scope for this patch.

Thanks for looking.

--
Justin

#11Tomas Vondra
tomas.vondra@enterprisedb.com
In reply to: Justin Pryzby (#10)
Re: Progress report of CREATE INDEX for nested partitioned tables

On 1/17/23 21:59, Justin Pryzby wrote:

On Tue, Jan 17, 2023 at 08:44:36PM +0100, Tomas Vondra wrote:

On 1/9/23 09:44, Ilya Gladyshev wrote:

On Sun, 2023-01-08 at 10:48 -0600, Justin Pryzby wrote:

On Sat, Dec 17, 2022 at 08:30:02AM -0600, Justin Pryzby wrote:

...

@committers: Is it okay to add nparts_done to IndexStmt ?

Any hint about this ?

AFAIK fields added at the end of a struct is seen as acceptable from the
ABI point of view. It's not risk-free, but we did that multiple times
when fixing bugs, IIRC.

My question isn't whether it's okay to add a field at the end of a
struct in general, but rather whether it's acceptable to add this field
at the end of this struct, and not because it's unsafe to do in a minor
release, but whether someone is going to say that it's an abuse of the
data structure.

Ah, you mean whether it's the right place for the parameter?

I don't think it is, really. IndexStmt is meant to be a description of
the CREATE INDEX statement, not something that includes info about how
it's processed. But it's the only struct we pass to the DefineIndex for
child indexes, so the only alternatives I can think of is a global
variable and the (existing) param array field.

Nevertheless, ABI compatibility is still relevant for backbranches.

Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

Good idea to try.

OK

As for the earlier discussion about the "correct" behavior for leaf vs.
non-leaf partitions and whether to calculate partitions in advance:

* I agree it's desirable to count partitions in advance, instead of
adding incrementally. The view is meant to provide "overview" of the
CREATE INDEX progress, and imagine you get

partitions_total partitions_done
10 9

so that you believe you're ~90% done. But then it jumps to the next
child and now you get

partitions_total partitions_done
20 10

which makes the view a bit useless for it's primary purpose, IMHO.

To be clear, that's the current, buggy behavior, and this thread is
about fixing it. The proposed patches all ought to avoid that.

But the bug isn't caused by not "calculating partitions in advance".
Rather, the issue is that currently, the "total" is overwritten while
recursing.

You're right the issue us about overwriting the total - not sure what I
was thinking about when writing this. I guess I got distracted by the
discussion about "preliminary planning" etc. Sorry for the confusion.

That's a separate question from whether indirect partitions are counted
or not.

I wonder if we could improve this to track the size of partitions for
total/done? That'd make leaf/non-leaf distinction unnecessary, because
non-leaf partitions have size 0.

Maybe, but it's out of scope for this patch.

+1, it was just an idea for future.

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#12Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#10)
Re: Progress report of CREATE INDEX for nested partitioned tables

TBH, I think the best approach is what I did in:
0001-report-top-parent-progress-for-CREATE-INDEX.txt

That's a minimal patch, ideal for backpatching.

..which defines/clarifies that the progress reporting is only for
*direct* children. That avoids the need to change any data structures,
and it's what was probably intended by the original patch, which doesn't
seem to have considered intermediate partitioned tables.

I think it'd be fine to re-define that in some future release, to allow
showing indirect children (probably only "leaves", and not intermediate
partitioned tables). Or "total_bytes" or other global progress.

--
Justin

#13Dean Rasheed
dean.a.rasheed@gmail.com
In reply to: Justin Pryzby (#12)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, 18 Jan 2023 at 15:25, Justin Pryzby <pryzby@telsasoft.com> wrote:

TBH, I think the best approach is what I did in:
0001-report-top-parent-progress-for-CREATE-INDEX.txt

That's a minimal patch, ideal for backpatching.

..which defines/clarifies that the progress reporting is only for
*direct* children. That avoids the need to change any data structures,
and it's what was probably intended by the original patch, which doesn't
seem to have considered intermediate partitioned tables.

I think it'd be fine to re-define that in some future release, to allow
showing indirect children (probably only "leaves", and not intermediate
partitioned tables). Or "total_bytes" or other global progress.

Hmm. My expectation as a user is that partitions_total includes both
direct and indirect (leaf) child partitions, that it is set just once
at the start of the process, and that partitions_done increases from
zero to partitions_total as the index-build proceeds. I think that
should be achievable with a minimally invasive patch that doesn't
change any data structures.

I agree with all the review comments Tomas posted. In particular, this
shouldn't need any changes to IndexStmt. I think the best approach
would be to just add a new function to backend_progress.c that offsets
a specified progress parameter by a specified amount, so that you can
just increment partitions_done by one or more, at the appropriate
points.

Regards,
Dean

#14Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Tomas Vondra (#9)
1 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch, that introduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we can assume that the only process that can write into MyBEEntry of the current backend is the current backend itself, therefore looping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Attachments:

0001-create-index-progress-increment.patchapplication/octet-stream; name=0001-create-index-progress-increment.patch; x-unix-mode=0644Download
From 5f377fc90f86ee08a2c9487c4bbb2d48f0a25777 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH] create index progress increment

---
 doc/src/sgml/monitoring.sgml                  |  4 +-
 src/backend/commands/indexcmds.c              | 51 +++++++++++++++++--
 src/backend/utils/activity/backend_progress.c | 27 ++++++++++
 src/include/utils/backend_progress.h          |  1 +
 4 files changed, 76 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b6..a911900271 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6612,7 +6612,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of leaf partitions on which the index has been created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e..a6c1889ed1 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,29 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1242,14 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				total_parts = count_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,9 +1460,16 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					/*
+					 * If the index has been attached, count all of its leaf indexes as attached,
+					 * because PROGRESS_CREATEIDX_PARTITIONS_TOTAL includes them all initially.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+												count_leaf_partitions(childRelid));
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1520,14 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done, otherwise, increment
+		 * done partition counter, if the counter is initialized.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b1..64a4406e0b 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,33 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of own backend entry.
+ *-----------
+ */
+void pgstat_progress_incr_param(int index, int64 incr) {
+	volatile PgBackendStatus *beentry = MyBEEntry;
+  	int64 val;
+  	int before_changecount PG_USED_FOR_ASSERTS_ONLY;
+  	int after_changecount PG_USED_FOR_ASSERTS_ONLY;
+
+  	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+  	if (!beentry || !pgstat_track_activities)
+    	return;
+
+	/* Because backend is the only process that writes to its own status,
+	 * we don't need to do the looping to read the value. */
+  	val = beentry->st_progress_param[index];
+  	val += incr;
+
+  	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+  	beentry->st_progress_param[index] = val;
+  	PGSTAT_END_WRITE_ACTIVITY(beentry);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab..afedd22d11 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 val);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.37.1 (Apple Git-137.1)

#15Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#14)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:

17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch, that introduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we can assume that the only process that can write into MyBEEntry of the current backend is the current backend itself, therefore looping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Thanks for the updated patch.

I think you're right - pgstat_begin_read_activity() is used for cases
involving other backends. It ought to be safe for a process to read its
own status bits, since we know they're not also being written.

You changed DefineIndex() to update progress for the leaf indexes' when
called recursively. The caller updates the progress for "attached"
indexes, but not created ones. That allows providing fine-granularity
progress updates when using intermediate partitions, right ? (Rather
than updating the progress by more than one at a time in the case of
intermediate partitioning).

If my understanding is right, that's subtle, and adds a bit of
complexity to the current code, so could use careful commentary. I
suggest:

* If the index was attached, update progress for all its direct and
* indirect leaf indexes all at once. If the index was built by calling
* DefineIndex() recursively, the called function is responsible for
* updating the progress report for built indexes.

...

* If this is the top-level index, we're done. When called recursively
* for child tables, the done partition counter is incremented now,
* rather than in the caller.

I guess you know that there were compiler warnings (related to your
question).
https://cirrus-ci.com/task/6571212386598912

pgstat_progress_incr_param() could call pgstat_progress_update_param()
rather than using its own Assert() and WRITE_ACTIVITY calls. I'm not
sure which I prefer, though.

Also, there are whitespace/tab/style issues in
pgstat_progress_incr_param().

--
Justin

#16Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#15)
1 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

1 февр. 2023 г., в 08:29, Justin Pryzby <pryzby@telsasoft.com> написал(а):

On Tue, Jan 31, 2023 at 07:32:20PM +0400, Ilya Gladyshev wrote:

17 янв. 2023 г., в 23:44, Tomas Vondra <tomas.vondra@enterprisedb.com> написал(а):
Do we actually need the new parts_done field? I mean, we already do
track the value - at PROGRESS_CREATEIDX_PARTITIONS_DONE index in the
st_progress_param array. Can't we simply read it from there? Then we
would not have ABI issues with the new field added to IndexStmt.

I think it’s a good approach and it could be useful outside of scope of this patch too. So I have attached a patch, that introduces pgstat_progress_incr_param function for this purpose. There’s one thing I am not sure about, IIUC, we can assume that the only process that can write into MyBEEntry of the current backend is the current backend itself, therefore looping to get consistent reads from this array is not required. Please correct me, if I am wrong here.

Thanks for the updated patch.

I think you're right - pgstat_begin_read_activity() is used for cases
involving other backends. It ought to be safe for a process to read its
own status bits, since we know they're not also being written.

You changed DefineIndex() to update progress for the leaf indexes' when
called recursively. The caller updates the progress for "attached"
indexes, but not created ones. That allows providing fine-granularity
progress updates when using intermediate partitions, right ? (Rather
than updating the progress by more than one at a time in the case of
intermediate partitioning).

If my understanding is right, that's subtle, and adds a bit of
complexity to the current code, so could use careful commentary. I
suggest:

* If the index was attached, update progress for all its direct and
* indirect leaf indexes all at once. If the index was built by calling
* DefineIndex() recursively, the called function is responsible for
* updating the progress report for built indexes.

...

* If this is the top-level index, we're done. When called recursively
* for child tables, the done partition counter is incremented now,
* rather than in the caller.

Yes, you are correct about the intended behavior, I added your comments to the patch.

I guess you know that there were compiler warnings (related to your
question).
https://cirrus-ci.com/task/6571212386598912

pgstat_progress_incr_param() could call pgstat_progress_update_param()
rather than using its own Assert() and WRITE_ACTIVITY calls. I'm not
sure which I prefer, though.

Also, there are whitespace/tab/style issues in
pgstat_progress_incr_param().

--
Justin

Thank you for the review, I fixed the aforementioned issues in the v2.

Attachments:

v2-0001-create-index-progress-increment.patchapplication/octet-stream; name=v2-0001-create-index-progress-increment.patch; x-unix-mode=0644Download
From 94ec0a5762b78c571127999be9e051ba74d88424 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH v2] create index progress increment

---
 doc/src/sgml/monitoring.sgml                  |  4 +-
 src/backend/commands/indexcmds.c              | 54 +++++++++++++++++--
 src/backend/utils/activity/backend_progress.c | 25 +++++++++
 src/include/utils/backend_progress.h          |  1 +
 4 files changed, 77 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b6..a911900271 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6612,7 +6612,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of leaf partitions on which the index has been created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e..0f925c3a69 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,29 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int		nleaves = 0;
+	List	*childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1242,14 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				total_parts = count_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1431,9 +1460,18 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					/*
+					 * If the index was attached, update progress for all its direct and
+					 * indirect leaf indexes all at once.  If the index was built by calling
+					 * DefineIndex() recursively, the called function is responsible for
+					 * updating the progress report for built indexes.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
+												count_leaf_partitions(childRelid));
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1522,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b1..45f7e7144b 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,31 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of own backend entry.
+ *-----------
+ */
+void pgstat_progress_incr_param(int index, int64 incr) {
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64 val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because current backend is the only process that writes to its own status,
+	 * we don't need to do the looping to read the value.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab..a84752ade9 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.37.1 (Apple Git-137.1)

#17Alvaro Herrera
alvherre@alvh.no-ip.org
In reply to: Ilya Gladyshev (#16)
Re: Progress report of CREATE INDEX for nested partitioned tables

Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

I think there's probably not much point optimizing it further than that.
If there was, then we could think about creating a data representation
that we can build for the entire partitioning hierarchy in a single pass
with the count of leaf partitions that sit below each specific non-leaf;
but I think that's just over-engineering.

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"After a quick R of TFM, all I can say is HOLY CR** THAT IS COOL! PostgreSQL was
amazing when I first started using it at 7.2, and I'm continually astounded by
learning new features and techniques made available by the continuing work of
the development team."
Berend Tober, http://archives.postgresql.org/pgsql-hackers/2007-08/msg01009.php

#18Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Alvaro Herrera (#17)
1 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):

Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

Attachments:

v3-0001-create-index-progress-increment.patchapplication/octet-stream; name=v3-0001-create-index-progress-increment.patch; x-unix-mode=0644Download
From f44ee1b30249fc62418725279356a0959d27e864 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH v3] create index progress increment

---
 doc/src/sgml/monitoring.sgml                  |  4 +-
 src/backend/commands/indexcmds.c              | 64 +++++++++++++++++--
 src/backend/utils/activity/backend_progress.c | 25 ++++++++
 src/include/utils/backend_progress.h          |  1 +
 4 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b6..a911900271 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6612,7 +6612,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of leaf partitions on which the index has been created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e..936b4e3c1d 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1243,14 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				total_parts = count_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1280,7 @@ DefineIndex(Oid relationId,
 			{
 				Oid			childRelid = part_oids[i];
 				Relation	childrel;
+				char		child_relkind;
 				Oid			child_save_userid;
 				int			child_save_sec_context;
 				int			child_save_nestlevel;
@@ -1259,6 +1290,7 @@ DefineIndex(Oid relationId,
 				bool		found = false;
 
 				childrel = table_open(childRelid, lockmode);
+				child_relkind = RelationGetForm(childrel)->relkind;
 
 				GetUserIdAndSecContext(&child_save_userid,
 									   &child_save_sec_context);
@@ -1431,9 +1463,25 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					int attached_parts = 1;
+
+					if (RELKIND_HAS_PARTITIONS(child_relkind))
+						attached_parts = count_leaf_partitions(childRelid);
+
+					/*
+					 * If the index was attached, we need to update progress
+					 * here, in its parent. For a partitioned index, we need
+					 * to mark all of its children that were included in
+					 * PROGRESS_CREATEIDX_PARTITIONS_TOTAL as done. If the
+					 * index was built by calling DefineIndex() recursively,
+					 * the called function is responsible for updating the
+					 * progress report for built indexes.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, attached_parts);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1532,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b1..45f7e7144b 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,31 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of own backend entry.
+ *-----------
+ */
+void pgstat_progress_incr_param(int index, int64 incr) {
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64 val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because current backend is the only process that writes to its own status,
+	 * we don't need to do the looping to read the value.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab..a84752ade9 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.37.1 (Apple Git-137.1)

#19Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Ilya Gladyshev (#18)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):

Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

Kind regards,

Matthias van de Meent

#20Justin Pryzby
pryzby@telsasoft.com
In reply to: Matthias van de Meent (#19)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:

On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

But the TOTAL is constant, right ? Updating the total when being called
recursively is the problem these patches fix.

--
Justin

#21Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Justin Pryzby (#20)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, 1 Feb 2023 at 16:53, Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:

On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

But the TOTAL is constant, right ? Updating the total when being called
recursively is the problem these patches fix.

If that's the case, then I'm not seeing the 'fix' part of the patch. I
thought this patch was fixing the provably incorrect TOTAL value where
DONE > TOTAL due to the recursive operation overwriting the DONE/TOTAL
values instead of updating them.

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

An example hierarchy:
CREATE TABLE parent (a, b) partition by list (a);
CREATE TABLE a1
PARTITION OF parent FOR VALUES IN (1)
PARTITION BY LIST (b);
CREATE TABLE a1bd
PARTITION OF a1 DEFAULT;

CREATE TABLE a2
PARTITION OF parent FOR VALUES IN (2)
PARTITION BY LIST (b);
CREATE TABLE a2bd
PARTITION OF a2 DEFAULT;

INSERT INTO parent (a, b) SELECT * from generate_series(1, 2) a(a)
cross join generate_series(1, 100000),b(b);
CREATE INDEX ON parent(a,b);

This will only discover that a2bd will need to be indexed after a1bd
is done (or vice versa, depending on which order a1 and a2 are
processed in the ).

Kind regards,

Matthias van de Meent

#22Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Matthias van de Meent (#21)
Re: Progress report of CREATE INDEX for nested partitioned tables

1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):

On Wed, 1 Feb 2023 at 16:53, Justin Pryzby <pryzby@telsasoft.com <mailto:pryzby@telsasoft.com>> wrote:

On Wed, Feb 01, 2023 at 04:21:35PM +0100, Matthias van de Meent wrote:

On Wed, 1 Feb 2023 at 15:21, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 16:01, Alvaro Herrera <alvherre@alvh.no-ip.org> написал(а):
Hmm, count_leaf_partitions has to scan pg_inherits and do a syscache
lookup for every single element therein ... this sounds slow.

In one of the callsites, we already have the partition descriptor
available. We could just scan partdesc->is_leaf[] and add one for each
'true' value we see there.

The problem is that partdesc contains only direct children of the table and we need all the children down the inheritance tree to count the total number of leaf partitions in the first callsite.

In the other callsite, we had the table open just a few lines before the
place you call count_leaf_partitions. Maybe we can rejigger things by
examining its state before closing it: if relkind is not partitioned we
know leaf_partitions=0, and only if partitioned we count leaf partitions.
I think that would save some work. I also wonder if it's worth writing
a bespoke function for counting leaf partitions rather than relying on
find_all_inheritors.

Sure, added this condition to avoid the extra work here.

When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.

I think we should also add a note about the (now) non-constant nature
of the value, something along the lines of "This value is updated as
we're processing and discovering partitioned tables in the partition
hierarchy".

But the TOTAL is constant, right ? Updating the total when being called
recursively is the problem these patches fix.

If that's the case, then I'm not seeing the 'fix' part of the patch. I
thought this patch was fixing the provably incorrect TOTAL value where
DONE > TOTAL due to the recursive operation overwriting the DONE/TOTAL
values instead of updating them.

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so yes, it is constant. From v3:

@@ -1219,8 +1243,14 @@ DefineIndex(Oid relationId,
Relation parentIndex;
TupleDesc parentDesc;

-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				total_parts = count_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}

It is set to the total number of children on all levels of the hierarchy, not just the current one, so the total value doesn’t need to be updated later, because it is set to the correct value from the very beginning.

It is the DONE counter that is updated, and when we attach an index of a partition that is itself a partitioned table (like a2 in your example, if it already had an index created), it will be updated by the number of children of the partition.

@@ -1431,9 +1463,25 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					int attached_parts = 1;
+
+					if (RELKIND_HAS_PARTITIONS(child_relkind))
+						attached_parts = count_leaf_partitions(childRelid);
+
+					/*
+					 * If the index was attached, we need to update progress
+					 * here, in its parent. For a partitioned index, we need
+					 * to mark all of its children that were included in
+					 * PROGRESS_CREATEIDX_PARTITIONS_TOTAL as done. If the
+					 * index was built by calling DefineIndex() recursively,
+					 * the called function is responsible for updating the
+					 * progress report for built indexes.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, attached_parts);
+				}

- pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
- i + 1);
free_attrmap(attmap);
}

#23Matthias van de Meent
boekewurm+postgres@gmail.com
In reply to: Ilya Gladyshev (#22)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so yes, it is constant. From v3:

Ugh, I misread the patch, more specifically count_leaf_partitions and
the !OidIsValid(parentIndexId) condition changes.

You are correct, sorry for the noise.

Kind regards,

Matthias van de Meent

#24Justin Pryzby
pryzby@telsasoft.com
In reply to: Matthias van de Meent (#23)
4 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, Feb 01, 2023 at 07:24:48PM +0100, Matthias van de Meent wrote:

On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so yes, it is constant. From v3:

Ugh, I misread the patch, more specifically count_leaf_partitions and
the !OidIsValid(parentIndexId) condition changes.

You are correct, sorry for the noise.

That suggests that the comments could've been more clear. I added a
comment suggested by Tomas and adjusted some others and wrote a commit
message. I even ran pgindent for about the 3rd time ever.

002 are my changes as a separate patch, which you could apply to your
local branch.

And 003/4 are assertions that I wrote to demonstrate the problem and the
verify the fixes, but not being proposed for commit.

--
Justin

Attachments:

0001-create-index-progress-increment.patchtext/x-diff; charset=us-asciiDownload
From 375961e18aaa7ed7b2ebee972ad07c7a38099ef4 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/4] create index progress increment

---
 doc/src/sgml/monitoring.sgml                  |  4 +-
 src/backend/commands/indexcmds.c              | 64 +++++++++++++++++--
 src/backend/utils/activity/backend_progress.c | 25 ++++++++
 src/include/utils/backend_progress.h          |  1 +
 4 files changed, 87 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b67..a911900271c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of leaf partitions on which the index is to be created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6612,7 +6612,7 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of leaf partitions on which the index has been created or attached.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..936b4e3c1db 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid	partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1243,14 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			if (!OidIsValid(parentIndexId))
+			{
+				int total_parts;
+
+				total_parts = count_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1280,7 @@ DefineIndex(Oid relationId,
 			{
 				Oid			childRelid = part_oids[i];
 				Relation	childrel;
+				char		child_relkind;
 				Oid			child_save_userid;
 				int			child_save_sec_context;
 				int			child_save_nestlevel;
@@ -1259,6 +1290,7 @@ DefineIndex(Oid relationId,
 				bool		found = false;
 
 				childrel = table_open(childRelid, lockmode);
+				child_relkind = RelationGetForm(childrel)->relkind;
 
 				GetUserIdAndSecContext(&child_save_userid,
 									   &child_save_sec_context);
@@ -1431,9 +1463,25 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					int attached_parts = 1;
+
+					if (RELKIND_HAS_PARTITIONS(child_relkind))
+						attached_parts = count_leaf_partitions(childRelid);
+
+					/*
+					 * If the index was attached, we need to update progress
+					 * here, in its parent. For a partitioned index, we need
+					 * to mark all of its children that were included in
+					 * PROGRESS_CREATEIDX_PARTITIONS_TOTAL as done. If the
+					 * index was built by calling DefineIndex() recursively,
+					 * the called function is responsible for updating the
+					 * progress report for built indexes.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, attached_parts);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1532,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b19..45f7e7144b4 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,31 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of own backend entry.
+ *-----------
+ */
+void pgstat_progress_incr_param(int index, int64 incr) {
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64 val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because current backend is the only process that writes to its own status,
+	 * we don't need to do the looping to read the value.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab6..a84752ade99 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.25.1

0002-s-fix-CREATE-INDEX-progress-report-with-nested-parti.patchtext/x-diff; charset=us-asciiDownload
From 3d78449f1c3d2fef8e96f4874f13004325026305 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 1 Feb 2023 21:50:21 -0600
Subject: [PATCH 2/4] s!fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, and if the attached index is
partitioned, the counter is incremented to account for each of its leaf
partitions.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml                  | 10 +++++--
 src/backend/commands/indexcmds.c              | 30 +++++++++++--------
 src/backend/utils/activity/backend_progress.c | 13 ++++----
 3 files changed, 34 insertions(+), 19 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a911900271c..fa139dcece7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of leaf partitions on which the index is to be created or attached.
+       the total number of partitions on which the index is to be created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6612,7 +6615,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of leaf partitions on which the index has been created or attached.
+       the number of partitions on which the index has been created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 936b4e3c1db..84c84c41acc 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -144,7 +144,7 @@ count_leaf_partitions(Oid relid)
 
 	foreach(lc, childs)
 	{
-		Oid	partrelid = lfirst_oid(lc);
+		Oid			partrelid = lfirst_oid(lc);
 
 		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
 			nleaves++;
@@ -1243,9 +1243,13 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
+			/*
+			 * Set the total number of partitions at the start of the command,
+			 * but don't change it when being called recursively.
+			 */
 			if (!OidIsValid(parentIndexId))
 			{
-				int total_parts;
+				int			total_parts;
 
 				total_parts = count_leaf_partitions(relationId);
 				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
@@ -1465,19 +1469,21 @@ DefineIndex(Oid relationId,
 				}
 				else
 				{
-					int attached_parts = 1;
+					int			attached_parts;
 
-					if (RELKIND_HAS_PARTITIONS(child_relkind))
-						attached_parts = count_leaf_partitions(childRelid);
+					/*
+					 * Avoid the overhead of counting partitions when that
+					 * can't apply.
+					 */
+					attached_parts = RELKIND_HAS_PARTITIONS(child_relkind) ?
+						count_leaf_partitions(childRelid) : 1;
 
 					/*
-					 * If the index was attached, we need to update progress
-					 * here, in its parent. For a partitioned index, we need
-					 * to mark all of its children that were included in
-					 * PROGRESS_CREATEIDX_PARTITIONS_TOTAL as done. If the
-					 * index was built by calling DefineIndex() recursively,
-					 * the called function is responsible for updating the
-					 * progress report for built indexes.
+					 * If a pre-existing index was attached, the progress
+					 * report is updated here.  If the index was partitioned,
+					 * all the children that were counted towards
+					 * PROGRESS_CREATEIDX_PARTITIONS_TOTAL are counted as
+					 * done.
 					 */
 					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, attached_parts);
 				}
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 45f7e7144b4..2a9994b98fd 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -61,12 +61,14 @@ pgstat_progress_update_param(int index, int64 val)
 /*-----------
  * pgstat_progress_incr_param() -
  *
- * Increment index'th member in st_progress_param[] of own backend entry.
+ * Increment index'th member in st_progress_param[] of the current backend.
  *-----------
  */
-void pgstat_progress_incr_param(int index, int64 incr) {
+void
+pgstat_progress_incr_param(int index, int64 incr)
+{
 	volatile PgBackendStatus *beentry = MyBEEntry;
-	int64 val;
+	int64		val;
 
 	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
 
@@ -74,8 +76,9 @@ void pgstat_progress_incr_param(int index, int64 incr) {
 		return;
 
 	/*
-	 * Because current backend is the only process that writes to its own status,
-	 * we don't need to do the looping to read the value.
+	 * Because no other process should write to this backend's own status, we
+	 * can read its value from shared memory without needing to loop to ensure
+	 * its consistency.
 	 */
 	val = beentry->st_progress_param[index];
 	val += incr;
-- 
2.25.1

0003-assertions-for-progress-reporting.patchtext/x-diff; charset=us-asciiDownload
From f484d5c9b602f158e86f9507b662b7bbbb4a19b7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 1 Feb 2023 10:23:53 -0600
Subject: [PATCH 3/4] assertions for progress reporting

---
 src/backend/utils/activity/backend_progress.c | 83 +++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 2a9994b98fd..d0566430fab 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,84 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*
+ * Check for consistency of progress data (current < total).
+ *
+ * Check during pgstat_progress_updates_*() rather than only from
+ * pgstat_progress_end_command() to catch issues with uninitialized/stale data
+ * from previous progress commands.
+ *
+ * If a command fails due to interrupt or error, the values may be less than
+ * the expected final value.
+ */
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64 *a = beentry->st_progress_param;
+
+	switch (beentry->st_progress_command)
+	{
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				   a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+			break;
+
+		case PROGRESS_COMMAND_ANALYZE:
+			Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+				   a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				   a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <=
+				   a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+			/* FALLTHROUGH */
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				   a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+			Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] <=
+				   a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			/* progress reporting is optional for these */
+			if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+			{
+				Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+				Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+			}
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+#if 0
+			//This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+				// to support simultaneous copy commands, as happens during JOIN.
+			/* bytes progress is not available in all cases */
+				if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+				//Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+					 a[PROGRESS_COPY_BYTES_PROCESSED],
+					 a[PROGRESS_COPY_BYTES_TOTAL]);
+#endif
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;				/* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -56,6 +135,8 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
@@ -113,6 +194,8 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
 	}
 
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
-- 
2.25.1

0004-f-also-assert-that-progress-values-don-t-go-backward.patchtext/x-diff; charset=us-asciiDownload
From ee27366bf3f7f5fd6a9884462476b4240cdd5c9a Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 21 Jan 2023 21:41:04 -0600
Subject: [PATCH 4/4] f! also assert that progress values don't go backwards
 and the total is constant

See also:
https://www.postgresql.org/message-id/CA%2BTgmoYSvEP3weQaCPGf6%2BDXLy2__JbJUYtoXyWP%3DqHcyGbihA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c          | 37 +++++++++
 src/backend/commands/analyze.c                | 10 ++-
 src/backend/storage/lmgr/lmgr.c               | 24 +++---
 src/backend/utils/activity/backend_progress.c | 77 +++++++++++++++++++
 4 files changed, 135 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f38..22661f6a292 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1042,6 +1042,13 @@ lazy_scan_heap(LVRelState *vacrel)
 
 				/* Forget the LP_DEAD items that we just vacuumed */
 				dead_items->num_items = 0;
+				{
+					int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+					const int64 progress_vals[] = {0};
+
+					pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+				}
+
 
 				/*
 				 * Periodically perform FSM vacuuming to make newly-freed
@@ -2199,6 +2206,13 @@ lazy_vacuum(LVRelState *vacrel)
 	{
 		Assert(!vacrel->do_index_cleanup);
 		vacrel->dead_items->num_items = 0;
+		{
+			int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+			const int64 progress_vals[] = {0};
+
+			pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+		}
+
 		return;
 	}
 
@@ -2301,6 +2315,13 @@ lazy_vacuum(LVRelState *vacrel)
 	 * vacuum)
 	 */
 	vacrel->dead_items->num_items = 0;
+
+	{
+		int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
+
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 }
 
 /*
@@ -2414,12 +2435,23 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 	BlockNumber vacuumed_pages = 0;
 	Buffer		vmbuffer = InvalidBuffer;
 	LVSavedErrInfo saved_err_info;
+#if 0
+	int const	progress_inds[] = {
+		PROGRESS_VACUUM_PHASE,
+		PROGRESS_VACUUM_NUM_DEAD_TUPLES,
+	};
+	const int64 progress_vals[] = {
+		PROGRESS_VACUUM_PHASE_VACUUM_HEAP,
+		0,
+	};
+#endif
 
 	Assert(vacrel->do_index_vacuuming);
 	Assert(vacrel->do_index_cleanup);
 	Assert(vacrel->num_index_scans > 0);
 
 	/* Report that we are now vacuuming the heap */
+	//pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
 
@@ -3190,7 +3222,12 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
 	dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
 	dead_items->max_items = max_items;
 	dead_items->num_items = 0;
+	{
+		int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
 
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 	vacrel->dead_items = dead_items;
 }
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c86e690980e..96710b84558 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1145,6 +1145,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TableScanDesc scan;
 	BlockNumber nblocks;
 	BlockNumber blksdone = 0;
+	int64		progress_vals[2] = {0};
+	int const	progress_inds[2] = {
+		PROGRESS_ANALYZE_BLOCKS_DONE,
+		PROGRESS_ANALYZE_BLOCKS_TOTAL
+	};
+
 #ifdef USE_PREFETCH
 	int			prefetch_maximum = 0;	/* blocks to prefetch if enabled */
 	BlockSamplerData prefetch_bs;
@@ -1169,8 +1175,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 #endif
 
 	/* Report sampling block numbers */
-	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
-								 nblocks);
+	progress_vals[1] = nblocks;
+	pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a6726..8666d850660 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -912,6 +912,15 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	int			total = 0;
 	int			done = 0;
 
+	const int	index[] = {
+		PROGRESS_WAITFOR_TOTAL,
+		PROGRESS_WAITFOR_DONE,
+		PROGRESS_WAITFOR_CURRENT_PID
+	};
+	const int64 values[] = {
+		0, 0, 0
+	};
+
 	/* Done if no locks to wait for */
 	if (locktags == NIL)
 		return;
@@ -930,7 +939,10 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	}
 
 	if (progress)
+	{
+		pgstat_progress_update_multi_param(3, index, values);
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, total);
+	}
 
 	/*
 	 * Note: GetLockConflicts() never reports our own xid, hence we need not
@@ -960,19 +972,9 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 				pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, ++done);
 		}
 	}
-	if (progress)
-	{
-		const int	index[] = {
-			PROGRESS_WAITFOR_TOTAL,
-			PROGRESS_WAITFOR_DONE,
-			PROGRESS_WAITFOR_CURRENT_PID
-		};
-		const int64 values[] = {
-			0, 0, 0
-		};
 
+	if (progress)
 		pgstat_progress_update_multi_param(3, index, values);
-	}
 
 	list_free_deep(holders);
 }
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d0566430fab..707ab929fac 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -116,6 +116,74 @@ pgstat_progress_asserts(void)
 	}
 }
 
+static void
+pgstat_progress_assert_forward_progress(int command, int index,
+										int64 oldval, int64 newval)
+{
+	switch (command)
+	{
+		case PROGRESS_COMMAND_ANALYZE:
+
+			/*
+			 * phase goes backwards for inheritance tables, which are sampled
+			 * twice
+			 */
+			if (index != PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID &&
+				index != PROGRESS_ANALYZE_PHASE)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_ANALYZE_BLOCKS_TOTAL ||
+				index == PROGRESS_ANALYZE_EXT_STATS_TOTAL ||
+				index == PROGRESS_ANALYZE_CHILD_TABLES_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			if (index != PROGRESS_CLUSTER_INDEX_RELID)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_CLUSTER_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			if (index != PROGRESS_CREATEIDX_INDEX_OID &&
+				index != PROGRESS_CREATEIDX_SUBPHASE &&
+				index != PROGRESS_WAITFOR_CURRENT_PID &&
+				index != PROGRESS_CREATEIDX_ACCESS_METHOD_OID)
+				Assert(newval >= oldval);
+
+			if (index == PROGRESS_CREATEIDX_TUPLES_TOTAL ||
+				index == PROGRESS_CREATEIDX_PARTITIONS_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL &&
+				oldval == 0 && newval == -1)
+				return;			/* Do nothing: this is the initial "null"
+								 * state before the size is estimated */
+			Assert(newval >= oldval);
+
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL ||
+				index == PROGRESS_BASEBACKUP_TBLSPC_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_COPY_BYTES_TOTAL)
+				Assert(oldval == 0);
+			break;
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_VACUUM_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -132,6 +200,15 @@ pgstat_progress_update_param(int index, int64 val)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	if (index != PROGRESS_SCAN_BLOCKS_DONE)
+	{
+		/* Check that progress does not go backwards */
+		int64		oldval = beentry->st_progress_param[index];
+
+		pgstat_progress_assert_forward_progress(beentry->st_progress_command,
+												index, oldval, val);
+	}
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
-- 
2.25.1

#25Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#24)
3 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Thu, Feb 02, 2023 at 09:18:07AM -0600, Justin Pryzby wrote:

On Wed, Feb 01, 2023 at 07:24:48PM +0100, Matthias van de Meent wrote:

On Wed, 1 Feb 2023 at 18:51, Ilya Gladyshev <ilya.v.gladyshev@gmail.com> wrote:

1 февр. 2023 г., в 20:27, Matthias van de Meent <boekewurm+postgres@gmail.com> написал(а):

In HEAD we set TOTAL to whatever number partitioned table we're
currently processing has - regardless of whether we're the top level
statement.
With the patch we instead add the number of child relations to that
count, for which REL_HAS_STORAGE(child) -- or at least, in the v3
posted by Ilya. Approximately immediately after updating that count we
recurse to the child relations, and that only returns once it is done
creating the indexes, so both TOTAL and DONE go up as we process more
partitions in the hierarchy.

The TOTAL in the patch is set only when processing the top-level parent and it is not updated when we recurse, so yes, it is constant. From v3:

Ugh, I misread the patch, more specifically count_leaf_partitions and
the !OidIsValid(parentIndexId) condition changes.

You are correct, sorry for the noise.

That suggests that the comments could've been more clear. I added a
comment suggested by Tomas and adjusted some others and wrote a commit
message. I even ran pgindent for about the 3rd time ever.

002 are my changes as a separate patch, which you could apply to your
local branch.

And 003/4 are assertions that I wrote to demonstrate the problem and the
verify the fixes, but not being proposed for commit.

That was probably a confusing way to present it - I should've sent the
relative diff as a .txt rather than as patch 002.

This squishes together 001/2 as the main patch.
I believe it's ready.

--
Justin

Attachments:

0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patchtext/x-diff; charset=us-asciiDownload
From ed643acace062cad15cd6ea9dc76a663de9c97d4 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, and if the attached index is
partitioned, the counter is incremented to account for each of its leaf
partitions.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml                  | 10 ++-
 src/backend/commands/indexcmds.c              | 70 +++++++++++++++++--
 src/backend/utils/activity/backend_progress.c | 28 ++++++++
 src/include/utils/backend_progress.h          |  1 +
 4 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 1756f1a4b67..fa139dcece7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6601,7 +6601,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of partitions on which the index is to be created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6612,7 +6615,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of partitions on which the index has been created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..84c84c41acc 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions, excluding foreign
+ * tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid			partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1243,18 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			/*
+			 * Set the total number of partitions at the start of the command,
+			 * but don't change it when being called recursively.
+			 */
+			if (!OidIsValid(parentIndexId))
+			{
+				int			total_parts;
+
+				total_parts = count_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1284,7 @@ DefineIndex(Oid relationId,
 			{
 				Oid			childRelid = part_oids[i];
 				Relation	childrel;
+				char		child_relkind;
 				Oid			child_save_userid;
 				int			child_save_sec_context;
 				int			child_save_nestlevel;
@@ -1259,6 +1294,7 @@ DefineIndex(Oid relationId,
 				bool		found = false;
 
 				childrel = table_open(childRelid, lockmode);
+				child_relkind = RelationGetForm(childrel)->relkind;
 
 				GetUserIdAndSecContext(&child_save_userid,
 									   &child_save_sec_context);
@@ -1431,9 +1467,27 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					int			attached_parts;
+
+					/*
+					 * Avoid the overhead of counting partitions when that
+					 * can't apply.
+					 */
+					attached_parts = RELKIND_HAS_PARTITIONS(child_relkind) ?
+						count_leaf_partitions(childRelid) : 1;
+
+					/*
+					 * If a pre-existing index was attached, the progress
+					 * report is updated here.  If the index was partitioned,
+					 * all the children that were counted towards
+					 * PROGRESS_CREATEIDX_PARTITIONS_TOTAL are counted as
+					 * done.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, attached_parts);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1538,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b19..2a9994b98fd 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,34 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of the current backend.
+ *-----------
+ */
+void
+pgstat_progress_incr_param(int index, int64 incr)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64		val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because no other process should write to this backend's own status, we
+	 * can read its value from shared memory without needing to loop to ensure
+	 * its consistency.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab6..a84752ade99 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.25.1

0002-assertions-for-progress-reporting.patchtext/x-diff; charset=us-asciiDownload
From 184bf2d291944c70323c95c99174ab446fc05149 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 1 Feb 2023 10:23:53 -0600
Subject: [PATCH 2/3] assertions for progress reporting

---
 src/backend/utils/activity/backend_progress.c | 83 +++++++++++++++++++
 1 file changed, 83 insertions(+)

diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 2a9994b98fd..a91b5881b04 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,84 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*
+ * Check for consistency of progress data (current < total).
+ *
+ * Check during pgstat_progress_updates_*() rather than only from
+ * pgstat_progress_end_command() to catch issues with uninitialized/stale data
+ * from previous progress commands.
+ *
+ * If a command fails due to interrupt or error, the values may be less than
+ * the expected final value.
+ */
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64 *a = beentry->st_progress_param;
+
+	switch (beentry->st_progress_command)
+	{
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				   a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+			break;
+
+		case PROGRESS_COMMAND_ANALYZE:
+			Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+				   a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				   a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <=
+				   a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+			/* fall through because CLUSTER rebuilds indexes */
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				   a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+			Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] <=
+				   a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			/* progress reporting is optional for these */
+			if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+			{
+				Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+				Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+			}
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+#if 0
+			//This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+				// to support simultaneous copy commands, as happens during JOIN.
+			/* bytes progress is not available in all cases */
+				if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+				//Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+					 a[PROGRESS_COPY_BYTES_PROCESSED],
+					 a[PROGRESS_COPY_BYTES_TOTAL]);
+#endif
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;				/* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -56,6 +135,8 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
@@ -113,6 +194,8 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
 	}
 
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
-- 
2.25.1

0003-f-also-assert-that-progress-values-don-t-go-backward.patchtext/x-diff; charset=us-asciiDownload
From e63816c3683eadb0a107843a2544e127e6c441d6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 21 Jan 2023 21:41:04 -0600
Subject: [PATCH 3/3] f! also assert that progress values don't go backwards
 and the total is constant

See also:
https://www.postgresql.org/message-id/CA%2BTgmoYSvEP3weQaCPGf6%2BDXLy2__JbJUYtoXyWP%3DqHcyGbihA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c          | 37 +++++++++
 src/backend/commands/analyze.c                | 10 ++-
 src/backend/storage/lmgr/lmgr.c               | 24 +++---
 src/backend/utils/activity/backend_progress.c | 77 +++++++++++++++++++
 4 files changed, 135 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f38..22661f6a292 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1042,6 +1042,13 @@ lazy_scan_heap(LVRelState *vacrel)
 
 				/* Forget the LP_DEAD items that we just vacuumed */
 				dead_items->num_items = 0;
+				{
+					int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+					const int64 progress_vals[] = {0};
+
+					pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+				}
+
 
 				/*
 				 * Periodically perform FSM vacuuming to make newly-freed
@@ -2199,6 +2206,13 @@ lazy_vacuum(LVRelState *vacrel)
 	{
 		Assert(!vacrel->do_index_cleanup);
 		vacrel->dead_items->num_items = 0;
+		{
+			int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+			const int64 progress_vals[] = {0};
+
+			pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+		}
+
 		return;
 	}
 
@@ -2301,6 +2315,13 @@ lazy_vacuum(LVRelState *vacrel)
 	 * vacuum)
 	 */
 	vacrel->dead_items->num_items = 0;
+
+	{
+		int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
+
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 }
 
 /*
@@ -2414,12 +2435,23 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 	BlockNumber vacuumed_pages = 0;
 	Buffer		vmbuffer = InvalidBuffer;
 	LVSavedErrInfo saved_err_info;
+#if 0
+	int const	progress_inds[] = {
+		PROGRESS_VACUUM_PHASE,
+		PROGRESS_VACUUM_NUM_DEAD_TUPLES,
+	};
+	const int64 progress_vals[] = {
+		PROGRESS_VACUUM_PHASE_VACUUM_HEAP,
+		0,
+	};
+#endif
 
 	Assert(vacrel->do_index_vacuuming);
 	Assert(vacrel->do_index_cleanup);
 	Assert(vacrel->num_index_scans > 0);
 
 	/* Report that we are now vacuuming the heap */
+	//pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
 
@@ -3190,7 +3222,12 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
 	dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
 	dead_items->max_items = max_items;
 	dead_items->num_items = 0;
+	{
+		int const	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
 
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 	vacrel->dead_items = dead_items;
 }
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index c86e690980e..96710b84558 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1145,6 +1145,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TableScanDesc scan;
 	BlockNumber nblocks;
 	BlockNumber blksdone = 0;
+	int64		progress_vals[2] = {0};
+	int const	progress_inds[2] = {
+		PROGRESS_ANALYZE_BLOCKS_DONE,
+		PROGRESS_ANALYZE_BLOCKS_TOTAL
+	};
+
 #ifdef USE_PREFETCH
 	int			prefetch_maximum = 0;	/* blocks to prefetch if enabled */
 	BlockSamplerData prefetch_bs;
@@ -1169,8 +1175,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 #endif
 
 	/* Report sampling block numbers */
-	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
-								 nblocks);
+	progress_vals[1] = nblocks;
+	pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a6726..8666d850660 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -912,6 +912,15 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	int			total = 0;
 	int			done = 0;
 
+	const int	index[] = {
+		PROGRESS_WAITFOR_TOTAL,
+		PROGRESS_WAITFOR_DONE,
+		PROGRESS_WAITFOR_CURRENT_PID
+	};
+	const int64 values[] = {
+		0, 0, 0
+	};
+
 	/* Done if no locks to wait for */
 	if (locktags == NIL)
 		return;
@@ -930,7 +939,10 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	}
 
 	if (progress)
+	{
+		pgstat_progress_update_multi_param(3, index, values);
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, total);
+	}
 
 	/*
 	 * Note: GetLockConflicts() never reports our own xid, hence we need not
@@ -960,19 +972,9 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 				pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, ++done);
 		}
 	}
-	if (progress)
-	{
-		const int	index[] = {
-			PROGRESS_WAITFOR_TOTAL,
-			PROGRESS_WAITFOR_DONE,
-			PROGRESS_WAITFOR_CURRENT_PID
-		};
-		const int64 values[] = {
-			0, 0, 0
-		};
 
+	if (progress)
 		pgstat_progress_update_multi_param(3, index, values);
-	}
 
 	list_free_deep(holders);
 }
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index a91b5881b04..3de10361222 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -116,6 +116,74 @@ pgstat_progress_asserts(void)
 	}
 }
 
+static void
+pgstat_progress_assert_forward_progress(int command, int index,
+										int64 oldval, int64 newval)
+{
+	switch (command)
+	{
+		case PROGRESS_COMMAND_ANALYZE:
+
+			/*
+			 * phase goes backwards for inheritance tables, which are sampled
+			 * twice
+			 */
+			if (index != PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID &&
+				index != PROGRESS_ANALYZE_PHASE)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_ANALYZE_BLOCKS_TOTAL ||
+				index == PROGRESS_ANALYZE_EXT_STATS_TOTAL ||
+				index == PROGRESS_ANALYZE_CHILD_TABLES_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			if (index != PROGRESS_CLUSTER_INDEX_RELID)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_CLUSTER_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			if (index != PROGRESS_CREATEIDX_INDEX_OID &&
+				index != PROGRESS_CREATEIDX_SUBPHASE &&
+				index != PROGRESS_WAITFOR_CURRENT_PID &&
+				index != PROGRESS_CREATEIDX_ACCESS_METHOD_OID)
+				Assert(newval >= oldval);
+
+			if (index == PROGRESS_CREATEIDX_TUPLES_TOTAL ||
+				index == PROGRESS_CREATEIDX_PARTITIONS_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL &&
+				oldval == 0 && newval == -1)
+				return;			/* Do nothing: this is the initial "null"
+								 * state before the size is estimated */
+			Assert(newval >= oldval);
+
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL ||
+				index == PROGRESS_BASEBACKUP_TBLSPC_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_COPY_BYTES_TOTAL)
+				Assert(oldval == 0);
+			break;
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_VACUUM_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -132,6 +200,15 @@ pgstat_progress_update_param(int index, int64 val)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	if (index != PROGRESS_SCAN_BLOCKS_DONE)
+	{
+		/* Check that progress does not go backwards */
+		int64		oldval = beentry->st_progress_param[index];
+
+		pgstat_progress_assert_forward_progress(beentry->st_progress_command,
+												index, oldval, val);
+	}
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
-- 
2.25.1

#26Justin Pryzby
pryzby@telsasoft.com
In reply to: Justin Pryzby (#25)
3 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Wed, Feb 08, 2023 at 04:40:49PM -0600, Justin Pryzby wrote:

This squishes together 001/2 as the main patch.
I believe it's ready.

Update to address a compiler warning in the supplementary patches adding
assertions.

Attachments:

0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patchtext/x-diff; charset=us-asciiDownload
From 71427bf7cd9927af04513ba3fe99e481a8ba1f61 Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, and if the attached index is
partitioned, the counter is incremented to account for each of its leaf
partitions.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml                  | 10 ++-
 src/backend/commands/indexcmds.c              | 70 +++++++++++++++++--
 src/backend/utils/activity/backend_progress.c | 28 ++++++++
 src/include/utils/backend_progress.h          |  1 +
 4 files changed, 102 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index dca50707ad4..945d64ed2fa 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6896,7 +6896,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of partitions on which the index is to be created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6907,7 +6910,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of partitions on which the index has been created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..147265ddcca 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,30 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions.  Note that this
+ * also excludes foreign tables.
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid			partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -1219,8 +1243,18 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			/*
+			 * Set the total number of partitions at the start of the command;
+			 * don't update it when being called recursively.
+			 */
+			if (!OidIsValid(parentIndexId))
+			{
+				int			total_parts;
+
+				total_parts = count_leaf_partitions(relationId);
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1284,7 @@ DefineIndex(Oid relationId,
 			{
 				Oid			childRelid = part_oids[i];
 				Relation	childrel;
+				char		child_relkind;
 				Oid			child_save_userid;
 				int			child_save_sec_context;
 				int			child_save_nestlevel;
@@ -1259,6 +1294,7 @@ DefineIndex(Oid relationId,
 				bool		found = false;
 
 				childrel = table_open(childRelid, lockmode);
+				child_relkind = RelationGetForm(childrel)->relkind;
 
 				GetUserIdAndSecContext(&child_save_userid,
 									   &child_save_sec_context);
@@ -1431,9 +1467,27 @@ DefineIndex(Oid relationId,
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					int			attached_parts;
+
+					/*
+					 * Avoid the overhead of counting partitions when that
+					 * can't apply.
+					 */
+					attached_parts = RELKIND_HAS_PARTITIONS(child_relkind) ?
+						count_leaf_partitions(childRelid) : 1;
+
+					/*
+					 * If a pre-existing index was attached, the progress
+					 * report is updated here.  If the index was partitioned,
+					 * all the children that were counted towards
+					 * PROGRESS_CREATEIDX_PARTITIONS_TOTAL are counted as
+					 * done.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, attached_parts);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1538,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b19..2a9994b98fd 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,34 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of the current backend.
+ *-----------
+ */
+void
+pgstat_progress_incr_param(int index, int64 incr)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64		val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because no other process should write to this backend's own status, we
+	 * can read its value from shared memory without needing to loop to ensure
+	 * its consistency.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab6..a84752ade99 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.34.1

0002-assertions-for-progress-reporting.patchtext/x-diff; charset=us-asciiDownload
From 30c5da099a6b75d7e4ce860815907326a361b042 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 1 Feb 2023 10:23:53 -0600
Subject: [PATCH 2/3] assertions for progress reporting

---
 src/backend/utils/activity/backend_progress.c | 84 +++++++++++++++++++
 1 file changed, 84 insertions(+)

diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 2a9994b98fd..63f9482b175 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,85 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*
+ * Check for consistency of progress data (current < total).
+ *
+ * Check during pgstat_progress_updates_*() rather than only from
+ * pgstat_progress_end_command() to catch issues with uninitialized/stale data
+ * from previous progress commands.
+ *
+ * If a command fails due to interrupt or error, the values may be less than
+ * the expected final value.
+ */
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64 *a = beentry->st_progress_param;
+
+	switch (beentry->st_progress_command)
+	{
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				   a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+			break;
+
+		case PROGRESS_COMMAND_ANALYZE:
+			Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+				   a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				   a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <=
+				   a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+			/* FALLTHROUGH */
+			/* ..because CLUSTER rebuilds indexes */
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				   a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+			Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] <=
+				   a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			/* progress reporting is optional for these */
+			if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+			{
+				Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+				Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+			}
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+#if 0
+			//This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+				// to support simultaneous copy commands, as happens during JOIN.
+			/* bytes progress is not available in all cases */
+				if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+				//Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+					 a[PROGRESS_COPY_BYTES_PROCESSED],
+					 a[PROGRESS_COPY_BYTES_TOTAL]);
+#endif
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;				/* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -56,6 +136,8 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
@@ -113,6 +195,8 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
 	}
 
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
-- 
2.34.1

0003-f-also-assert-that-progress-values-don-t-go-backward.patchtext/x-diff; charset=us-asciiDownload
From 5ca1f068413a64eeba0a5fb914287f9855ffb9b7 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 21 Jan 2023 21:41:04 -0600
Subject: [PATCH 3/3] f! also assert that progress values don't go backwards
 and the total is constant

See also:
https://www.postgresql.org/message-id/CA%2BTgmoYSvEP3weQaCPGf6%2BDXLy2__JbJUYtoXyWP%3DqHcyGbihA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c          | 37 +++++++++
 src/backend/commands/analyze.c                | 10 ++-
 src/backend/storage/lmgr/lmgr.c               | 24 +++---
 src/backend/utils/activity/backend_progress.c | 77 +++++++++++++++++++
 4 files changed, 135 insertions(+), 13 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f38..61cfbf6a17a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1042,6 +1042,13 @@ lazy_scan_heap(LVRelState *vacrel)
 
 				/* Forget the LP_DEAD items that we just vacuumed */
 				dead_items->num_items = 0;
+				{
+					const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+					const int64 progress_vals[] = {0};
+
+					pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+				}
+
 
 				/*
 				 * Periodically perform FSM vacuuming to make newly-freed
@@ -2199,6 +2206,13 @@ lazy_vacuum(LVRelState *vacrel)
 	{
 		Assert(!vacrel->do_index_cleanup);
 		vacrel->dead_items->num_items = 0;
+		{
+			const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+			const int64 progress_vals[] = {0};
+
+			pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+		}
+
 		return;
 	}
 
@@ -2301,6 +2315,13 @@ lazy_vacuum(LVRelState *vacrel)
 	 * vacuum)
 	 */
 	vacrel->dead_items->num_items = 0;
+
+	{
+		const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
+
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 }
 
 /*
@@ -2414,12 +2435,23 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 	BlockNumber vacuumed_pages = 0;
 	Buffer		vmbuffer = InvalidBuffer;
 	LVSavedErrInfo saved_err_info;
+#if 0
+	const int	progress_inds[] = {
+		PROGRESS_VACUUM_PHASE,
+		PROGRESS_VACUUM_NUM_DEAD_TUPLES,
+	};
+	const int64 progress_vals[] = {
+		PROGRESS_VACUUM_PHASE_VACUUM_HEAP,
+		0,
+	};
+#endif
 
 	Assert(vacrel->do_index_vacuuming);
 	Assert(vacrel->do_index_cleanup);
 	Assert(vacrel->num_index_scans > 0);
 
 	/* Report that we are now vacuuming the heap */
+	//pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
 
@@ -3190,7 +3222,12 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
 	dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
 	dead_items->max_items = max_items;
 	dead_items->num_items = 0;
+	{
+		const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
 
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 	vacrel->dead_items = dead_items;
 }
 
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 65750958bb2..3bfc941aa2c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1145,6 +1145,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TableScanDesc scan;
 	BlockNumber nblocks;
 	BlockNumber blksdone = 0;
+	int64		progress_vals[2] = {0};
+	int const	progress_inds[2] = {
+		PROGRESS_ANALYZE_BLOCKS_DONE,
+		PROGRESS_ANALYZE_BLOCKS_TOTAL
+	};
+
 #ifdef USE_PREFETCH
 	int			prefetch_maximum = 0;	/* blocks to prefetch if enabled */
 	BlockSamplerData prefetch_bs;
@@ -1169,8 +1175,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 #endif
 
 	/* Report sampling block numbers */
-	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
-								 nblocks);
+	progress_vals[1] = nblocks;
+	pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a6726..8666d850660 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -912,6 +912,15 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	int			total = 0;
 	int			done = 0;
 
+	const int	index[] = {
+		PROGRESS_WAITFOR_TOTAL,
+		PROGRESS_WAITFOR_DONE,
+		PROGRESS_WAITFOR_CURRENT_PID
+	};
+	const int64 values[] = {
+		0, 0, 0
+	};
+
 	/* Done if no locks to wait for */
 	if (locktags == NIL)
 		return;
@@ -930,7 +939,10 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	}
 
 	if (progress)
+	{
+		pgstat_progress_update_multi_param(3, index, values);
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, total);
+	}
 
 	/*
 	 * Note: GetLockConflicts() never reports our own xid, hence we need not
@@ -960,19 +972,9 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 				pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, ++done);
 		}
 	}
-	if (progress)
-	{
-		const int	index[] = {
-			PROGRESS_WAITFOR_TOTAL,
-			PROGRESS_WAITFOR_DONE,
-			PROGRESS_WAITFOR_CURRENT_PID
-		};
-		const int64 values[] = {
-			0, 0, 0
-		};
 
+	if (progress)
 		pgstat_progress_update_multi_param(3, index, values);
-	}
 
 	list_free_deep(holders);
 }
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 63f9482b175..e4cfde8c723 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -117,6 +117,74 @@ pgstat_progress_asserts(void)
 	}
 }
 
+static void
+pgstat_progress_assert_forward_progress(int command, int index,
+										int64 oldval, int64 newval)
+{
+	switch (command)
+	{
+		case PROGRESS_COMMAND_ANALYZE:
+
+			/*
+			 * phase goes backwards for inheritance tables, which are sampled
+			 * twice
+			 */
+			if (index != PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID &&
+				index != PROGRESS_ANALYZE_PHASE)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_ANALYZE_BLOCKS_TOTAL ||
+				index == PROGRESS_ANALYZE_EXT_STATS_TOTAL ||
+				index == PROGRESS_ANALYZE_CHILD_TABLES_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			if (index != PROGRESS_CLUSTER_INDEX_RELID)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_CLUSTER_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			if (index != PROGRESS_CREATEIDX_INDEX_OID &&
+				index != PROGRESS_CREATEIDX_SUBPHASE &&
+				index != PROGRESS_WAITFOR_CURRENT_PID &&
+				index != PROGRESS_CREATEIDX_ACCESS_METHOD_OID)
+				Assert(newval >= oldval);
+
+			if (index == PROGRESS_CREATEIDX_TUPLES_TOTAL ||
+				index == PROGRESS_CREATEIDX_PARTITIONS_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL &&
+				oldval == 0 && newval == -1)
+				return;			/* Do nothing: this is the initial "null"
+								 * state before the size is estimated */
+			Assert(newval >= oldval);
+
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL ||
+				index == PROGRESS_BASEBACKUP_TBLSPC_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_COPY_BYTES_TOTAL)
+				Assert(oldval == 0);
+			break;
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_VACUUM_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -133,6 +201,15 @@ pgstat_progress_update_param(int index, int64 val)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	if (index != PROGRESS_SCAN_BLOCKS_DONE)
+	{
+		/* Check that progress does not go backwards */
+		int64		oldval = beentry->st_progress_param[index];
+
+		pgstat_progress_assert_forward_progress(beentry->st_progress_command,
+												index, oldval, val);
+	}
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
-- 
2.34.1

#27Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#26)
Re: Progress report of CREATE INDEX for nested partitioned tables

Justin Pryzby <pryzby@telsasoft.com> writes:

Update to address a compiler warning in the supplementary patches adding
assertions.

I took a look through this. It seems like basically a good solution,
but the count_leaf_partitions() function is bothering me, for two
reasons:

1. It seems like a pretty expensive thing to do. Don't we have the
info at hand somewhere already?

2. Is it really safe to do find_all_inheritors with NoLock? If so,
a comment explaining why would be good.

regards, tom lane

#28Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#27)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

Update to address a compiler warning in the supplementary patches adding
assertions.

I took a look through this. It seems like basically a good solution,
but the count_leaf_partitions() function is bothering me, for two
reasons:

1. It seems like a pretty expensive thing to do. Don't we have the
info at hand somewhere already?

I don't know where that would be. We need the list of both direct *and*
indirect partitions. See:
/messages/by-id/5073D187-4200-4A2D-BAC0-91C657E3C22E@gmail.com

If it would help to avoid the concern, then I might consider proposing
not to call get_rel_relkind() ...

2. Is it really safe to do find_all_inheritors with NoLock? If so,
a comment explaining why would be good.

In both cases (both for the parent and for case of a partitioned child
with pre-existing indexes being ATTACHed), the table itself is already
locked by DefineIndex():

lockmode = concurrent ? ShareUpdateExclusiveLock : ShareLock;
rel = table_open(relationId, lockmode);

and
childrel = table_open(childRelid, lockmode);
...
table_close(childrel, NoLock);

And, find_all_inheritors() will also have been called by
ProcessUtilitySlow(). Maybe it's sufficient to mention that ?

--
Justin

#29Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#28)
Re: Progress report of CREATE INDEX for nested partitioned tables

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:

I took a look through this. It seems like basically a good solution,
but the count_leaf_partitions() function is bothering me, for two
reasons:

... find_all_inheritors() will also have been called by
ProcessUtilitySlow(). Maybe it's sufficient to mention that ?

Hm. Could we get rid of count_leaf_partitions by doing the work in
ProcessUtilitySlow? Or at least passing that OID list forward instead
of recomputing it?

regards, tom lane

#30Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#29)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Fri, Mar 10, 2023 at 03:36:10PM -0500, Tom Lane wrote:

I took a look through this. It seems like basically a good solution,
but the count_leaf_partitions() function is bothering me, for two
reasons:

... find_all_inheritors() will also have been called by
ProcessUtilitySlow(). Maybe it's sufficient to mention that ?

Hm. Could we get rid of count_leaf_partitions by doing the work in
ProcessUtilitySlow? Or at least passing that OID list forward instead
of recomputing it?

count_leaf_partitions() is called in two places:

Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL. It'd be easy enough to
pass an integer total via IndexStmt (but I think we wanted to avoid
adding anything there, since it's not a part of the statement).

count_leaf_partitions() is also called for sub-partitions, in the case
that a matching "partitioned index" already exists, and the progress
report needs to be incremented by the number of leaves for which indexes
were ATTACHED. We'd need a mapping from OID => npartitions (or to
compile some data structure of all the partitioned partitions). I guess
CreateIndex() could call CreatePartitionDirectory(). But it looks like
that would be *more* expensive.

--
Justin

#31Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#30)
Re: Progress report of CREATE INDEX for nested partitioned tables

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:

Hm. Could we get rid of count_leaf_partitions by doing the work in
ProcessUtilitySlow? Or at least passing that OID list forward instead
of recomputing it?

count_leaf_partitions() is called in two places:

Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL. It'd be easy enough to
pass an integer total via IndexStmt (but I think we wanted to avoid
adding anything there, since it's not a part of the statement).

I agree that adding such a field to IndexStmt would be a very bad idea.
However, adding another parameter to DefineIndex doesn't seem like a
problem. Or could we just call pgstat_progress_update_param directly from
ProcessUtilitySlow, after counting the partitions in the existing loop?

count_leaf_partitions() is also called for sub-partitions, in the case
that a matching "partitioned index" already exists, and the progress
report needs to be incremented by the number of leaves for which indexes
were ATTACHED.

Can't you increment progress by one at the point where the actual attach
happens?

I also wonder whether leaving non-leaf partitions out of the total
is making things more complicated rather than simpler ...

We'd need a mapping from OID => npartitions (or to
compile some data structure of all the partitioned partitions). I guess
CreateIndex() could call CreatePartitionDirectory(). But it looks like
that would be *more* expensive.

The reason I find this annoying is that the non-optional nature of the
progress reporting mechanism was sold on the basis that it would add
only negligible overhead. Adding extra pass(es) over pg_inherits
breaks that promise. Maybe it's cheap enough to not matter in the
big scheme of things, but we should not be having to make arguments
like that one.

regards, tom lane

#32Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#31)
Re: Progress report of CREATE INDEX for nested partitioned tables

I wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

count_leaf_partitions() is also called for sub-partitions, in the case
that a matching "partitioned index" already exists, and the progress
report needs to be incremented by the number of leaves for which indexes
were ATTACHED.

Can't you increment progress by one at the point where the actual attach
happens?

Oh, never mind; now I realize that the point is that you didn't ever
iterate over those leaf indexes.

However, consider a thought experiment: assume for whatever reason that
all the actual index builds happen first, then all the cases where you
succeed in attaching a sub-partitioned index happen at the end of the
command. In that case, the percentage-done indicator would go from
some-number to 100% more or less instantly.

What if we simply do nothing at sub-partitioned indexes? Or if that's
slightly too radical, just increase the PARTITIONS_DONE counter by 1?
That would look indistinguishable from the case where all the attaches
happen at the end.

regards, tom lane

#33Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#32)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Mar 12, 2023 at 04:14:06PM -0400, Tom Lane wrote:

Hm. Could we get rid of count_leaf_partitions by doing the work in
ProcessUtilitySlow? Or at least passing that OID list forward instead
of recomputing it?

count_leaf_partitions() is called in two places:

Once to get PROGRESS_CREATEIDX_PARTITIONS_TOTAL. It'd be easy enough to
pass an integer total via IndexStmt (but I think we wanted to avoid
adding anything there, since it's not a part of the statement).

I agree that adding such a field to IndexStmt would be a very bad idea.
However, adding another parameter to DefineIndex doesn't seem like a
problem.

It's a problem since this is a bug and it's desirable to backpatch a
fix, right ?

Or could we just call pgstat_progress_update_param directly from
ProcessUtilitySlow, after counting the partitions in the existing loop?

That'd be fine if it was only needed for TOTAL, but it doesn't handle
the 2nd call to count_leaf_partitions().

On Sun, Mar 12, 2023 at 08:15:28PM -0400, Tom Lane wrote:

I wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

count_leaf_partitions() is also called for sub-partitions, in the case
that a matching "partitioned index" already exists, and the progress
report needs to be incremented by the number of leaves for which indexes
were ATTACHED.

Can't you increment progress by one at the point where the actual attach
happens?

Oh, never mind; now I realize that the point is that you didn't ever
iterate over those leaf indexes.

However, consider a thought experiment: assume for whatever reason that
all the actual index builds happen first, then all the cases where you
succeed in attaching a sub-partitioned index happen at the end of the
command. In that case, the percentage-done indicator would go from
some-number to 100% more or less instantly.

What if we simply do nothing at sub-partitioned indexes? Or if that's
slightly too radical, just increase the PARTITIONS_DONE counter by 1?
That would look indistinguishable from the case where all the attaches
happen at the end.

Incrementing by 0 sounds terrible, since someone who has intermediate
partitioned tables is likely to always see 0% done. (It's true that
intermediate partitioned tables don't seem to have been considered by
the original patch, and it's indisputable that progress reporting
currently misbehaves in that case).

And incrementing PARTITIONS_DONE by 1 could lead to bogus progress
reporting with "N_done > N_Total" if an intermediate partitioned table
had no leaf partitions at all. That's one of the problems this thread
is trying to fix (the other being "total changing in the middle of the
command").

Maybe your idea is usable though, since indirect partitioned indexes
*can* be counted correctly during recursion. What's hard to fix is the
case that an index is both *partitioned* and *attached*. Maybe it's
okay to count that case as 0. The consequence is that the command would
end before the progress report got to 100%.

The other option seems to be to define the progress report to count only
*direct* children.
/messages/by-id/20221213044331.GJ27893@telsasoft.com

The reason I find this annoying is that the non-optional nature of the
progress reporting mechanism was sold on the basis that it would add
only negligible overhead. Adding extra pass(es) over pg_inherits
breaks that promise. Maybe it's cheap enough to not matter in the
big scheme of things, but we should not be having to make arguments
like that one.

If someone is running a DDL command involving nested partitions, I'm not
so concerned about the cost of additional scans of pg_inherits. They
either have enough data to justify partitioning partitions, or they're
doing something silly.

--
Justin

#34Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#33)
Re: Progress report of CREATE INDEX for nested partitioned tables

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:

I agree that adding such a field to IndexStmt would be a very bad idea.
However, adding another parameter to DefineIndex doesn't seem like a
problem.

It's a problem since this is a bug and it's desirable to backpatch a
fix, right ?

I do not think this is important enough to justify a back-patch.

Incrementing by 0 sounds terrible, since someone who has intermediate
partitioned tables is likely to always see 0% done.

How so? The counter will increase after there's some actual work done,
ie building an index. If there's no indexes to build then it hardly
matters, because the command will complete in very little time.

And incrementing PARTITIONS_DONE by 1 could lead to bogus progress
reporting with "N_done > N_Total" if an intermediate partitioned table
had no leaf partitions at all.

Well, we could fix that if we made TOTAL be the total number of
descendants rather than just the leaves ;-). But I think not
incrementing is probably better.

regards, tom lane

#35Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#34)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:

I agree that adding such a field to IndexStmt would be a very bad idea.
However, adding another parameter to DefineIndex doesn't seem like a
problem.

It's a problem since this is a bug and it's desirable to backpatch a
fix, right ?

I do not think this is important enough to justify a back-patch.

That's fine with me, but it comes as a surprise, and it might invalidate
earlier discussions, which were working under the constraint of
maintaining a compatible ABI.

Incrementing by 0 sounds terrible, since someone who has intermediate
partitioned tables is likely to always see 0% done.

How so? The counter will increase after there's some actual work done,
ie building an index. If there's no indexes to build then it hardly
matters, because the command will complete in very little time.

I misunderstood your idea as suggesting to skip progress reporting for
*every* intermediate partitioned index, and its leaves. But I guess
what you meant is to skip progress reporting when ATTACHing an
intermediate partitioned index. That seems okay, since 1) intermediate
partitioned tables are probably rare, and 2) ATTACH is fast, so the
effect is indistinguisable from querying the progress report a few
moments later.

The idea would be for:
1) TOTAL to show the number of direct and indirect leaf partitions;
2) update progress while building direct or indirect indexes;
3) ATTACHing intermediate partitioned tables to increment by 0;
4) ATTACHing a direct child should continue to increment by 1,
since that common case already works as expected and shouldn't be
changed.

The only change from the current patch is (3). (1) still calls
count_leaf_partitions(), but only once. I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

--
Justin

#36Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#35)
Re: Progress report of CREATE INDEX for nested partitioned tables

Justin Pryzby <pryzby@telsasoft.com> writes:

The idea would be for:
1) TOTAL to show the number of direct and indirect leaf partitions;
2) update progress while building direct or indirect indexes;
3) ATTACHing intermediate partitioned tables to increment by 0;
4) ATTACHing a direct child should continue to increment by 1,
since that common case already works as expected and shouldn't be
changed.

OK.

The only change from the current patch is (3). (1) still calls
count_leaf_partitions(), but only once. I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

I don't agree with that. find_all_inheritors is fairly expensive
and it seems completely silly to do it twice just to avoid adding
a parameter to DefineIndex.

regards, tom lane

#37Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#35)
Re: Progress report of CREATE INDEX for nested partitioned tables

14 марта 2023 г., в 18:34, Justin Pryzby <pryzby@telsasoft.com> написал(а):

On Mon, Mar 13, 2023 at 10:42:59AM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sun, Mar 12, 2023 at 06:25:13PM -0400, Tom Lane wrote:

I agree that adding such a field to IndexStmt would be a very bad idea.
However, adding another parameter to DefineIndex doesn't seem like a
problem.

It's a problem since this is a bug and it's desirable to backpatch a
fix, right ?

I do not think this is important enough to justify a back-patch.

That's fine with me, but it comes as a surprise, and it might invalidate
earlier discussions, which were working under the constraint of
maintaining a compatible ABI.

Incrementing by 0 sounds terrible, since someone who has intermediate
partitioned tables is likely to always see 0% done.

How so? The counter will increase after there's some actual work done,
ie building an index. If there's no indexes to build then it hardly
matters, because the command will complete in very little time.

I misunderstood your idea as suggesting to skip progress reporting for
*every* intermediate partitioned index, and its leaves. But I guess
what you meant is to skip progress reporting when ATTACHing an
intermediate partitioned index. That seems okay, since 1) intermediate
partitioned tables are probably rare, and 2) ATTACH is fast, so the
effect is indistinguisable from querying the progress report a few
moments later.

+1 that counting attached partitioned indexes as 0 is fine.

The idea would be for:
1) TOTAL to show the number of direct and indirect leaf partitions;
2) update progress while building direct or indirect indexes;
3) ATTACHing intermediate partitioned tables to increment by 0;
4) ATTACHing a direct child should continue to increment by 1,
since that common case already works as expected and shouldn't be
changed.

The only change from the current patch is (3). (1) still calls
count_leaf_partitions(), but only once. I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

--
Justin

As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site for DefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE that calls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in unnecessary code duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are going to optimize it, after all. Especially if back-patching is a non-issue.

#38Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#37)
1 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:

The only change from the current patch is (3). (1) still calls
count_leaf_partitions(), but only once. I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site for DefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE that calls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in unnecessary code duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are going to optimize it, after all. Especially if back-patching is a non-issue.

Yeah. See attached. I don't like duplicating the loop. Is this really
the right direction to go ?

I haven't verified if the child tables are locked in all the paths which
would call count_leaf_partitions(). But why is it important to lock
them for this? If they weren't locked before, that'd be a pre-existing
problem...

Attachments:

0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patchtext/x-diff; charset=us-asciiDownload
From 1e05b055d1bd21265b68265ddef5e9654629087c Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml                  | 10 ++-
 src/backend/bootstrap/bootparse.y             |  2 +
 src/backend/commands/indexcmds.c              | 70 +++++++++++++++++--
 src/backend/commands/tablecmds.c              |  4 +-
 src/backend/tcop/utility.c                    |  8 ++-
 src/backend/utils/activity/backend_progress.c | 28 ++++++++
 src/include/commands/defrem.h                 |  1 +
 src/include/utils/backend_progress.h          |  1 +
 8 files changed, 115 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d02..3f891b75541 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6901,7 +6901,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of partitions on which the index is to be created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6912,7 +6915,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of partitions on which the index has been created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598e..81a1b7bfec3 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -306,6 +306,7 @@ Boot_DeclareIndexStmt:
 								$4,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
@@ -358,6 +359,7 @@ Boot_DeclareUniqueIndexStmt:
 								$5,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..62a8f0d6aa2 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,32 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions.  Note that this
+ * also excludes foreign tables.  This should be consistent with the loop in
+ * ProcessUtilitySlow().
+ * XXX: are the partitions already locked when this is called by other code paths ?
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid			partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -518,6 +544,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
  *		case the caller had better have checked it earlier.
  * 'skip_build': make the catalog entries but don't create the index files
  * 'quiet': suppress the NOTICE chatter ordinarily provided for constraints.
+ * 'total_leaves': total number of leaf partitions (excluding foreign tables)
  *
  * Returns the object address of the created index.
  */
@@ -527,6 +554,7 @@ DefineIndex(Oid relationId,
 			Oid indexRelationId,
 			Oid parentIndexId,
 			Oid parentConstraintId,
+			int total_leaves,
 			bool is_alter_table,
 			bool check_rights,
 			bool check_not_in_use,
@@ -1219,8 +1247,22 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			/*
+			 * Set the total number of partitions at the start of the command;
+			 * don't update it when being called recursively.
+			 */
+			if (!OidIsValid(parentIndexId))
+			{
+				 /*
+				  * When called by ProcessUtilitySlow() the number of leaves is
+				  * passed in as an optimization.
+				  */
+				 if (total_leaves < 0)
+					 total_leaves = count_leaf_partitions(relationId);
+
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_leaves);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1292,7 @@ DefineIndex(Oid relationId,
 			{
 				Oid			childRelid = part_oids[i];
 				Relation	childrel;
+				char		child_relkind;
 				Oid			child_save_userid;
 				int			child_save_sec_context;
 				int			child_save_nestlevel;
@@ -1259,6 +1302,7 @@ DefineIndex(Oid relationId,
 				bool		found = false;
 
 				childrel = table_open(childRelid, lockmode);
+				child_relkind = RelationGetForm(childrel)->relkind;
 
 				GetUserIdAndSecContext(&child_save_userid,
 									   &child_save_sec_context);
@@ -1426,14 +1470,24 @@ DefineIndex(Oid relationId,
 								InvalidOid, /* no predefined OID */
 								indexRelationId,	/* this is our child */
 								createdConstraintId,
+								-1, /* The progress isn't updated during recursion */
 								is_alter_table, check_rights, check_not_in_use,
 								skip_build, quiet);
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					/*
+					 * If a pre-existing index was attached, the progress
+					 * report is updated here.  But if the index is
+					 * partitioned, we don't count its partitions, since
+					 * that's expensive, and the ATTACH is fast anyway.
+					 */
+					if (!RELKIND_HAS_PARTITIONS(child_relkind))
+						pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1538,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3e2c5f797cd..2ef4491ec8e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1216,6 +1216,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 						InvalidOid,
 						RelationGetRelid(idxRel),
 						constraintOid,
+						-1,
 						false, false, false, false, false);
 
 			index_close(idxRel, AccessShareLock);
@@ -8640,6 +8641,7 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 						  InvalidOid,	/* no predefined OID */
 						  InvalidOid,	/* no parent index */
 						  InvalidOid,	/* no parent constraint */
+						  -1,
 						  true, /* is_alter_table */
 						  check_rights,
 						  false,	/* check_not_in_use - we did it already */
@@ -18105,7 +18107,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 										   &conOid);
 			DefineIndex(RelationGetRelid(attachrel), stmt, InvalidOid,
 						RelationGetRelid(idxRel),
-						conOid,
+						conOid, -1,
 						true, false, false, false, false);
 		}
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index c7d9d96b45d..c8317822a64 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1462,6 +1462,7 @@ ProcessUtilitySlow(ParseState *pstate,
 					Oid			relid;
 					LOCKMODE	lockmode;
 					bool		is_alter_table;
+					int			total_leaves = 0;
 
 					if (stmt->concurrent)
 						PreventInTransactionBlock(isTopLevel,
@@ -1502,7 +1503,8 @@ ProcessUtilitySlow(ParseState *pstate,
 						inheritors = find_all_inheritors(relid, lockmode, NULL);
 						foreach(lc, inheritors)
 						{
-							char		relkind = get_rel_relkind(lfirst_oid(lc));
+							Oid			partrelid = lfirst_oid(lc);
+							char		relkind = get_rel_relkind(partrelid);
 
 							if (relkind != RELKIND_RELATION &&
 								relkind != RELKIND_MATVIEW &&
@@ -1519,6 +1521,9 @@ ProcessUtilitySlow(ParseState *pstate,
 												stmt->relation->relname),
 										 errdetail("Table \"%s\" contains partitions that are foreign tables.",
 												   stmt->relation->relname)));
+
+							if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+								total_leaves++;
 						}
 						list_free(inheritors);
 					}
@@ -1545,6 +1550,7 @@ ProcessUtilitySlow(ParseState *pstate,
 									InvalidOid, /* no predefined OID */
 									InvalidOid, /* no parent index */
 									InvalidOid, /* no parent constraint */
+									total_leaves,
 									is_alter_table,
 									true,	/* check_rights */
 									true,	/* check_not_in_use */
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b19..2a9994b98fd 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,34 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of the current backend.
+ *-----------
+ */
+void
+pgstat_progress_incr_param(int index, int64 incr)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64		val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because no other process should write to this backend's own status, we
+	 * can read its value from shared memory without needing to loop to ensure
+	 * its consistency.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 4f7f87fc62c..b774b3028ce 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -29,6 +29,7 @@ extern ObjectAddress DefineIndex(Oid relationId,
 								 Oid indexRelationId,
 								 Oid parentIndexId,
 								 Oid parentConstraintId,
+								 int total_leaves,
 								 bool is_alter_table,
 								 bool check_rights,
 								 bool check_not_in_use,
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab6..a84752ade99 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.34.1

#39Ilya Gladyshev
ilya.v.gladyshev@gmail.com
In reply to: Justin Pryzby (#38)
Re: Progress report of CREATE INDEX for nested partitioned tables

16 марта 2023 г., в 04:07, Justin Pryzby <pryzby@telsasoft.com> написал(а):

On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:

The only change from the current patch is (3). (1) still calls
count_leaf_partitions(), but only once. I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site for DefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE that calls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in unnecessary code duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are going to optimize it, after all. Especially if back-patching is a non-issue.

Yeah. See attached. I don't like duplicating the loop. Is this really
the right direction to go ?

I haven't verified if the child tables are locked in all the paths which
would call count_leaf_partitions(). But why is it important to lock
them for this? If they weren't locked before, that'd be a pre-existing
problem...
<0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch>

I’m not sure what the general policy on locking is, but I have checked ALTER TABLE ADD INDEX, and the all the partitions seem to be locked on the first entry to DefineIndex there. All other call sites pass in the parentIndexId, which means the progress tracking machinery will not be initialized, so I think, we don’t need to do locking in count_leaf_partitions().

The approach in the patch looks good to me. Some nitpicks on the patch:
1. There’s an unnecessary second call to get_rel_relkind in ProcessUtilitySlow, we can just use what’s in the variable relkind.
2. We can also combine else and if to have one less nested level like that:
+ else if (!RELKIND_HAS_PARTITIONS(child_relkind))

3. There was a part of the comment saying "If the index was built by calling DefineIndex() recursively, the called function is responsible for updating the progress report for built indexes.", I think it is still useful to have it there.

#40Justin Pryzby
pryzby@telsasoft.com
In reply to: Ilya Gladyshev (#39)
3 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Thu, Mar 16, 2023 at 07:04:16PM +0400, Ilya Gladyshev wrote:

16 марта 2023 г., в 04:07, Justin Pryzby <pryzby@telsasoft.com> написал(а):

On Tue, Mar 14, 2023 at 06:58:14PM +0400, Ilya Gladyshev wrote:

The only change from the current patch is (3). (1) still calls
count_leaf_partitions(), but only once. I'd prefer that to rearranging
the progress reporting to set the TOTAL in ProcessUtilitySlow().

As for reusing TOTAL calculated outside of DefineIndex, as I can see, ProcessUtilitySlow is not the only call site for DefineIndex (although, I don’t know whether all of them need progress tracking), for instance, there is ALTER TABLE that calls DefineIndex to create index for constraints. So I feel like rearranging progress reporting will result in unnecessary code duplication in those call sites, so passing in an optional parameter seems to be easier here, if we are going to optimize it, after all. Especially if back-patching is a non-issue.

Yeah. See attached. I don't like duplicating the loop. Is this really
the right direction to go ?

I haven't verified if the child tables are locked in all the paths which
would call count_leaf_partitions(). But why is it important to lock
them for this? If they weren't locked before, that'd be a pre-existing
problem...
<0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patch>

I’m not sure what the general policy on locking is, but I have checked ALTER TABLE ADD INDEX, and the all the partitions seem to be locked on the first entry to DefineIndex there. All other call sites pass in the parentIndexId, which means the progress tracking machinery will not be initialized, so I think, we don’t need to do locking in count_leaf_partitions().

The approach in the patch looks good to me. Some nitpicks on the patch:
1. There’s an unnecessary second call to get_rel_relkind in ProcessUtilitySlow, we can just use what’s in the variable relkind.
2. We can also combine else and if to have one less nested level like that:
+ else if (!RELKIND_HAS_PARTITIONS(child_relkind))

3. There was a part of the comment saying "If the index was built by calling DefineIndex() recursively, the called function is responsible for updating the progress report for built indexes.", I think it is still useful to have it there.

Thanks, I addressed (1) and (3). (2) is deliberate, to allow a place to
put the comment which is not specific to the !HAS_PARTITIONS case.

--
Justin

Attachments:

0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patchtext/x-diff; charset=us-asciiDownload
From 90300bc4ca08088de3a0015ff5c1a251537feacc Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, but doesn't count intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev
Reviewed-By: Justin Pryzby, Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml                  | 10 ++-
 src/backend/bootstrap/bootparse.y             |  2 +
 src/backend/commands/indexcmds.c              | 74 +++++++++++++++++--
 src/backend/commands/tablecmds.c              |  4 +-
 src/backend/tcop/utility.c                    |  8 +-
 src/backend/utils/activity/backend_progress.c | 28 +++++++
 src/include/commands/defrem.h                 |  1 +
 src/include/utils/backend_progress.h          |  1 +
 8 files changed, 119 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 6249bb50d02..3f891b75541 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6901,7 +6901,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of partitions on which the index is to be created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6912,7 +6915,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of partitions on which the index has been created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, but excludes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598e..81a1b7bfec3 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -306,6 +306,7 @@ Boot_DeclareIndexStmt:
 								$4,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
@@ -358,6 +359,7 @@ Boot_DeclareUniqueIndexStmt:
 								$5,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 16ec0b114e6..2a87c851746 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -130,6 +130,31 @@ typedef struct ReindexErrorInfo
 	char		relkind;
 } ReindexErrorInfo;
 
+
+/*
+ * Count the number of direct and indirect leaf partitions.  Note that this
+ * also excludes foreign tables.  This should be consistent with the loop in
+ * ProcessUtilitySlow().
+ */
+static int
+count_leaf_partitions(Oid relid)
+{
+	int			nleaves = 0;
+	List	   *childs = find_all_inheritors(relid, NoLock, NULL);
+	ListCell   *lc;
+
+	foreach(lc, childs)
+	{
+		Oid			partrelid = lfirst_oid(lc);
+
+		if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))
+			nleaves++;
+	}
+
+	list_free(childs);
+	return nleaves;
+}
+
 /*
  * CheckIndexCompatible
  *		Determine whether an existing index definition is compatible with a
@@ -518,6 +543,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
  *		case the caller had better have checked it earlier.
  * 'skip_build': make the catalog entries but don't create the index files
  * 'quiet': suppress the NOTICE chatter ordinarily provided for constraints.
+ * 'total_leaves': total number of leaf partitions (excluding foreign tables)
  *
  * Returns the object address of the created index.
  */
@@ -527,6 +553,7 @@ DefineIndex(Oid relationId,
 			Oid indexRelationId,
 			Oid parentIndexId,
 			Oid parentConstraintId,
+			int total_leaves,
 			bool is_alter_table,
 			bool check_rights,
 			bool check_not_in_use,
@@ -1219,8 +1246,22 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			/*
+			 * Set the total number of partitions at the start of the command;
+			 * don't update it when being called recursively.
+			 */
+			if (!OidIsValid(parentIndexId))
+			{
+				 /*
+				  * When called by ProcessUtilitySlow(), the number of leaves is
+				  * passed in as an optimization.
+				  */
+				 if (total_leaves < 0)
+					 total_leaves = count_leaf_partitions(relationId);
+
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_leaves);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1250,6 +1291,7 @@ DefineIndex(Oid relationId,
 			{
 				Oid			childRelid = part_oids[i];
 				Relation	childrel;
+				char		child_relkind;
 				Oid			child_save_userid;
 				int			child_save_sec_context;
 				int			child_save_nestlevel;
@@ -1259,6 +1301,7 @@ DefineIndex(Oid relationId,
 				bool		found = false;
 
 				childrel = table_open(childRelid, lockmode);
+				child_relkind = RelationGetForm(childrel)->relkind;
 
 				GetUserIdAndSecContext(&child_save_userid,
 									   &child_save_sec_context);
@@ -1426,14 +1469,29 @@ DefineIndex(Oid relationId,
 								InvalidOid, /* no predefined OID */
 								indexRelationId,	/* this is our child */
 								createdConstraintId,
+								-1,
 								is_alter_table, check_rights, check_not_in_use,
 								skip_build, quiet);
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					/*
+					 * If the index was built by calling DefineIndex()
+					 * recursively, the called function is responsible for
+					 * updating the progress report for built indexes.
+					 *
+					 * If a pre-existing index was attached, the progress
+					 * report is updated here.  But if the index is
+					 * partitioned, we don't count its partitions, since that's
+					 * expensive, and the ATTACH is fast anyway.
+					 */
+
+					if (!RELKIND_HAS_PARTITIONS(child_relkind))
+						pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1484,9 +1542,15 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, we're done. When called recursively
+		 * for child tables, the done partition counter is incremented now,
+		 * rather than in the caller.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3e2c5f797cd..df2fe468b65 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1216,6 +1216,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 						InvalidOid,
 						RelationGetRelid(idxRel),
 						constraintOid,
+						-1,
 						false, false, false, false, false);
 
 			index_close(idxRel, AccessShareLock);
@@ -8640,6 +8641,7 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 						  InvalidOid,	/* no predefined OID */
 						  InvalidOid,	/* no parent index */
 						  InvalidOid,	/* no parent constraint */
+						  -1,			/* total_leaves unknown */
 						  true, /* is_alter_table */
 						  check_rights,
 						  false,	/* check_not_in_use - we did it already */
@@ -18105,7 +18107,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 										   &conOid);
 			DefineIndex(RelationGetRelid(attachrel), stmt, InvalidOid,
 						RelationGetRelid(idxRel),
-						conOid,
+						conOid, -1,
 						true, false, false, false, false);
 		}
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index eada7353639..bbc3d7ada23 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1465,6 +1465,7 @@ ProcessUtilitySlow(ParseState *pstate,
 					Oid			relid;
 					LOCKMODE	lockmode;
 					bool		is_alter_table;
+					int			total_leaves = 0;
 
 					if (stmt->concurrent)
 						PreventInTransactionBlock(isTopLevel,
@@ -1505,7 +1506,8 @@ ProcessUtilitySlow(ParseState *pstate,
 						inheritors = find_all_inheritors(relid, lockmode, NULL);
 						foreach(lc, inheritors)
 						{
-							char		relkind = get_rel_relkind(lfirst_oid(lc));
+							Oid			partrelid = lfirst_oid(lc);
+							char		relkind = get_rel_relkind(partrelid);
 
 							if (relkind != RELKIND_RELATION &&
 								relkind != RELKIND_MATVIEW &&
@@ -1522,6 +1524,9 @@ ProcessUtilitySlow(ParseState *pstate,
 												stmt->relation->relname),
 										 errdetail("Table \"%s\" contains partitions that are foreign tables.",
 												   stmt->relation->relname)));
+
+							if (RELKIND_HAS_STORAGE(relkind))
+								total_leaves++;
 						}
 						list_free(inheritors);
 					}
@@ -1548,6 +1553,7 @@ ProcessUtilitySlow(ParseState *pstate,
 									InvalidOid, /* no predefined OID */
 									InvalidOid, /* no parent index */
 									InvalidOid, /* no parent constraint */
+									total_leaves,
 									is_alter_table,
 									true,	/* check_rights */
 									true,	/* check_not_in_use */
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b19..2a9994b98fd 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,34 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of the current backend.
+ *-----------
+ */
+void
+pgstat_progress_incr_param(int index, int64 incr)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64		val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because no other process should write to this backend's own status, we
+	 * can read its value from shared memory without needing to loop to ensure
+	 * its consistency.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 4f7f87fc62c..b774b3028ce 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -29,6 +29,7 @@ extern ObjectAddress DefineIndex(Oid relationId,
 								 Oid indexRelationId,
 								 Oid parentIndexId,
 								 Oid parentConstraintId,
+								 int total_leaves,
 								 bool is_alter_table,
 								 bool check_rights,
 								 bool check_not_in_use,
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab6..a84752ade99 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.34.1

0002-assertions-for-progress-reporting.patchtext/x-diff; charset=us-asciiDownload
From a3965c2025062c5dc7080c1025e530628a6180cf Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 1 Feb 2023 10:23:53 -0600
Subject: [PATCH 2/3] assertions for progress reporting

---
 src/backend/commands/analyze.c                | 10 ++-
 src/backend/utils/activity/backend_progress.c | 84 +++++++++++++++++++
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 65750958bb2..3bfc941aa2c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1145,6 +1145,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TableScanDesc scan;
 	BlockNumber nblocks;
 	BlockNumber blksdone = 0;
+	int64		progress_vals[2] = {0};
+	int const	progress_inds[2] = {
+		PROGRESS_ANALYZE_BLOCKS_DONE,
+		PROGRESS_ANALYZE_BLOCKS_TOTAL
+	};
+
 #ifdef USE_PREFETCH
 	int			prefetch_maximum = 0;	/* blocks to prefetch if enabled */
 	BlockSamplerData prefetch_bs;
@@ -1169,8 +1175,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 #endif
 
 	/* Report sampling block numbers */
-	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
-								 nblocks);
+	progress_vals[1] = nblocks;
+	pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 2a9994b98fd..63f9482b175 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,85 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*
+ * Check for consistency of progress data (current < total).
+ *
+ * Check during pgstat_progress_updates_*() rather than only from
+ * pgstat_progress_end_command() to catch issues with uninitialized/stale data
+ * from previous progress commands.
+ *
+ * If a command fails due to interrupt or error, the values may be less than
+ * the expected final value.
+ */
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64 *a = beentry->st_progress_param;
+
+	switch (beentry->st_progress_command)
+	{
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				   a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+			break;
+
+		case PROGRESS_COMMAND_ANALYZE:
+			Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+				   a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				   a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <=
+				   a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+			/* FALLTHROUGH */
+			/* ..because CLUSTER rebuilds indexes */
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				   a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+			Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] <=
+				   a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			/* progress reporting is optional for these */
+			if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+			{
+				Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+				Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+			}
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+#if 0
+			//This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+				// to support simultaneous copy commands, as happens during JOIN.
+			/* bytes progress is not available in all cases */
+				if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+				//Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+					 a[PROGRESS_COPY_BYTES_PROCESSED],
+					 a[PROGRESS_COPY_BYTES_TOTAL]);
+#endif
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;				/* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -56,6 +136,8 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
@@ -113,6 +195,8 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
 	}
 
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
-- 
2.34.1

0003-f-also-assert-that-progress-values-don-t-go-backward.patchtext/x-diff; charset=us-asciiDownload
From 84388375d4808d44877fc9e26eacf9219158be2b Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 21 Jan 2023 21:41:04 -0600
Subject: [PATCH 3/3] f! also assert that progress values don't go backwards
 and the total is constant

pgstat_progress_update_multi_param() is being abused as a way to update
a value which might otherwise violate the assertion.

See also:
https://www.postgresql.org/message-id/CA%2BTgmoYSvEP3weQaCPGf6%2BDXLy2__JbJUYtoXyWP%3DqHcyGbihA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c          | 37 +++++++++
 src/backend/storage/lmgr/lmgr.c               | 24 +++---
 src/backend/utils/activity/backend_progress.c | 81 +++++++++++++++++++
 3 files changed, 130 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f38..61cfbf6a17a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1042,6 +1042,13 @@ lazy_scan_heap(LVRelState *vacrel)
 
 				/* Forget the LP_DEAD items that we just vacuumed */
 				dead_items->num_items = 0;
+				{
+					const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+					const int64 progress_vals[] = {0};
+
+					pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+				}
+
 
 				/*
 				 * Periodically perform FSM vacuuming to make newly-freed
@@ -2199,6 +2206,13 @@ lazy_vacuum(LVRelState *vacrel)
 	{
 		Assert(!vacrel->do_index_cleanup);
 		vacrel->dead_items->num_items = 0;
+		{
+			const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+			const int64 progress_vals[] = {0};
+
+			pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+		}
+
 		return;
 	}
 
@@ -2301,6 +2315,13 @@ lazy_vacuum(LVRelState *vacrel)
 	 * vacuum)
 	 */
 	vacrel->dead_items->num_items = 0;
+
+	{
+		const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
+
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 }
 
 /*
@@ -2414,12 +2435,23 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 	BlockNumber vacuumed_pages = 0;
 	Buffer		vmbuffer = InvalidBuffer;
 	LVSavedErrInfo saved_err_info;
+#if 0
+	const int	progress_inds[] = {
+		PROGRESS_VACUUM_PHASE,
+		PROGRESS_VACUUM_NUM_DEAD_TUPLES,
+	};
+	const int64 progress_vals[] = {
+		PROGRESS_VACUUM_PHASE_VACUUM_HEAP,
+		0,
+	};
+#endif
 
 	Assert(vacrel->do_index_vacuuming);
 	Assert(vacrel->do_index_cleanup);
 	Assert(vacrel->num_index_scans > 0);
 
 	/* Report that we are now vacuuming the heap */
+	//pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
 
@@ -3190,7 +3222,12 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
 	dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
 	dead_items->max_items = max_items;
 	dead_items->num_items = 0;
+	{
+		const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
 
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 	vacrel->dead_items = dead_items;
 }
 
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a6726..a3acdce5ff5 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -912,6 +912,13 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	int			total = 0;
 	int			done = 0;
 
+	const int	progress_index[] = {
+		PROGRESS_WAITFOR_TOTAL,
+		PROGRESS_WAITFOR_DONE,
+		PROGRESS_WAITFOR_CURRENT_PID
+	};
+	const int64 progress_values[] = {0, 0, 0};
+
 	/* Done if no locks to wait for */
 	if (locktags == NIL)
 		return;
@@ -930,7 +937,10 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	}
 
 	if (progress)
+	{
+		pgstat_progress_update_multi_param(3, progress_index, progress_values);
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, total);
+	}
 
 	/*
 	 * Note: GetLockConflicts() never reports our own xid, hence we need not
@@ -960,19 +970,9 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 				pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, ++done);
 		}
 	}
+
 	if (progress)
-	{
-		const int	index[] = {
-			PROGRESS_WAITFOR_TOTAL,
-			PROGRESS_WAITFOR_DONE,
-			PROGRESS_WAITFOR_CURRENT_PID
-		};
-		const int64 values[] = {
-			0, 0, 0
-		};
-
-		pgstat_progress_update_multi_param(3, index, values);
-	}
+		pgstat_progress_update_multi_param(3, progress_index, progress_values);
 
 	list_free_deep(holders);
 }
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 63f9482b175..41ad6884521 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -117,6 +117,78 @@ pgstat_progress_asserts(void)
 	}
 }
 
+/*
+ * Check that newval >= oldval, and that when total is not being set twice.
+ */
+static void
+pgstat_progress_assert_forward_progress(int command, int index,
+										int64 oldval, int64 newval)
+{
+	switch (command)
+	{
+		case PROGRESS_COMMAND_ANALYZE:
+			/*
+			 * phase goes backwards for inheritance tables, which are sampled
+			 * twice
+			 */
+			if (index != PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID &&
+				index != PROGRESS_ANALYZE_PHASE)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_ANALYZE_BLOCKS_TOTAL ||
+				index == PROGRESS_ANALYZE_EXT_STATS_TOTAL ||
+				index == PROGRESS_ANALYZE_CHILD_TABLES_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			if (index != PROGRESS_CLUSTER_INDEX_RELID)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_CLUSTER_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			if (index != PROGRESS_CREATEIDX_INDEX_OID &&
+				index != PROGRESS_CREATEIDX_SUBPHASE &&
+				index != PROGRESS_WAITFOR_CURRENT_PID &&
+				index != PROGRESS_CREATEIDX_ACCESS_METHOD_OID)
+				Assert(newval >= oldval);
+
+			if (index == PROGRESS_CREATEIDX_TUPLES_TOTAL ||
+				index == PROGRESS_CREATEIDX_PARTITIONS_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL &&
+				oldval == 0 && newval == -1)
+				return;			/* Do nothing: this is the initial "null"
+								 * state before the size is estimated */
+			Assert(newval >= oldval);
+
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL ||
+				index == PROGRESS_BASEBACKUP_TBLSPC_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_COPY_BYTES_TOTAL)
+				Assert(oldval == 0);
+			break;
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_VACUUM_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			if (index == PROGRESS_VACUUM_MAX_DEAD_TUPLES)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -133,6 +205,15 @@ pgstat_progress_update_param(int index, int64 val)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	if (index != PROGRESS_SCAN_BLOCKS_DONE)
+	{
+		/* Check that progress does not go backwards */
+		int64		oldval = beentry->st_progress_param[index];
+
+		pgstat_progress_assert_forward_progress(beentry->st_progress_command,
+												index, oldval, val);
+	}
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
-- 
2.34.1

#41Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#40)
Re: Progress report of CREATE INDEX for nested partitioned tables

So I'm still pretty desperately unhappy with count_leaf_partitions.
I don't like expending significant cost purely for progress tracking
purposes, I don't like the undocumented assumption that NoLock is
safe, and what's more, if it is safe then we've already traversed
the inheritance tree to lock everything so in principle we could
have the count already. However, it does seem like getting that
knowledge from point A to point B would be a mess in most places.

One thing we could do to reduce the cost (and improve the safety)
is to forget the idea of checking the relkinds and just set the
PARTITIONS_TOTAL count to list_length() of the find_all_inheritors
result. We've already agreed that it's okay if the PARTITIONS_DONE
count never reaches PARTITIONS_TOTAL, so this would just be taking
that idea further. (Or we could increment PARTITIONS_DONE for
non-leaf partitions when we visit them, thus making that TOTAL
more nearly correct.) Furthermore, as things stand it's not hard
for PARTITIONS_TOTAL to be zero --- there's at least one such case
in the regression tests --- and that seems just weird to me.

By the by, this is awful code:

+ if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))

Consult the definition of RELKIND_HAS_STORAGE to see why.
But I want to get rid of that rather than fixing it.

regards, tom lane

#42Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#41)
3 attachment(s)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:

So I'm still pretty desperately unhappy with count_leaf_partitions.
I don't like expending significant cost purely for progress tracking
purposes, I don't like the undocumented assumption that NoLock is
safe, and what's more, if it is safe then we've already traversed
the inheritance tree to lock everything so in principle we could
have the count already. However, it does seem like getting that
knowledge from point A to point B would be a mess in most places.

One thing we could do to reduce the cost (and improve the safety)
is to forget the idea of checking the relkinds and just set the
PARTITIONS_TOTAL count to list_length() of the find_all_inheritors
result.

Actually list_length() minus 1 ...

We've already agreed that it's okay if the PARTITIONS_DONE
count never reaches PARTITIONS_TOTAL, so this would just be taking
that idea further. (Or we could increment PARTITIONS_DONE for
non-leaf partitions when we visit them, thus making that TOTAL
more nearly correct.)

Yes, I think that's actually more correct. If TOTAL is set without
regard to relkind, then DONE ought to be set the same way.

I updated the documentation to indicate that the counters include the
intermediate partitioned rels, but I wonder if it's better to say
nothing and leave that undefined.

Furthermore, as things stand it's not hard
for PARTITIONS_TOTAL to be zero --- there's at least one such case
in the regression tests --- and that seems just weird to me.

I don't know why it'd seem weird. postgres doesn't create partitions
automatically, so by default there are none. If we create a table but
never load any data, it'll have no partitions. Also, the TOTAL=0 case
won't go away just because we start counting intermediate partitions.

By the by, this is awful code:

+ if (RELKIND_HAS_STORAGE(get_rel_relkind(partrelid)))

Consult the definition of RELKIND_HAS_STORAGE to see why.
But I want to get rid of that rather than fixing it.

Good point, but I'd burden-shift the blame to RELKIND_HAS_STORAGE().

BTW, I promoted myself to a co-author of the patch. My interest here is
to resolve this hoping to allow the CIC patch to progress.

--
Justin

Attachments:

0001-fix-CREATE-INDEX-progress-report-with-nested-partiti.patchtext/x-diff; charset=us-asciiDownload
From b48837a934ef1381d26ef62bbaf66f5e52b0cd6d Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev <ilya.v.gladyshev@gmail.com>
Date: Tue, 31 Jan 2023 19:13:07 +0400
Subject: [PATCH 1/3] fix CREATE INDEX progress report with nested partitions

The progress reporting was added in v12 (ab0dfc961) but the original
patch didn't seem to consider the possibility of nested partitioning.

When called recursively, DefineIndex() would clobber the number of
completed partitions, and it was possible to end up with the TOTAL
counter greater than the DONE counter.

This clarifies/re-defines that the progress reporting counts both direct
and indirect children, as well as intermediate partitioned tables:

- The TOTAL counter is set once at the start of the command.
- For indexes which are newly-built, the recursively-called
DefineIndex() increments the DONE counter.
- For pre-existing indexes which are ATTACHed rather than built,
DefineIndex() increments the DONE counter, unless the attached index is
partitioned, in which case progress report is not updated.

Author: Ilya Gladyshev, Justin Pryzby
Reviewed-By: Tomas Vondra, Dean Rasheed, Alvaro Herrera, Matthias van de Meent, Tom Lane
Discussion: https://www.postgresql.org/message-id/flat/a15f904a70924ffa4ca25c3c744cff31e0e6e143.camel%40gmail.com
---
 doc/src/sgml/monitoring.sgml                  | 10 +++-
 src/backend/bootstrap/bootparse.y             |  2 +
 src/backend/commands/indexcmds.c              | 60 +++++++++++++++++--
 src/backend/commands/tablecmds.c              |  4 +-
 src/backend/tcop/utility.c                    |  6 +-
 src/backend/utils/activity/backend_progress.c | 28 +++++++++
 src/include/commands/defrem.h                 |  1 +
 src/include/utils/backend_progress.h          |  1 +
 8 files changed, 103 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 7ab4424bf13..cf88dca53f3 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6929,7 +6929,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the total number of partitions on which the index is to be created.
+       the total number of partitions on which the index is to be created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, and includes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
@@ -6940,7 +6943,10 @@ FROM pg_stat_get_backend_idset() AS backendid;
       </para>
       <para>
        When creating an index on a partitioned table, this column is set to
-       the number of partitions on which the index has been created.
+       the number of partitions on which the index has been created or attached.
+       In the case of intermediate partitioned tables, this includes both
+       direct and indirect partitions, and includes the intermediate
+       partitioned tables themselves.
        This field is <literal>0</literal> during a <literal>REINDEX</literal>.
       </para></entry>
      </row>
diff --git a/src/backend/bootstrap/bootparse.y b/src/backend/bootstrap/bootparse.y
index 86804bb598e..81a1b7bfec3 100644
--- a/src/backend/bootstrap/bootparse.y
+++ b/src/backend/bootstrap/bootparse.y
@@ -306,6 +306,7 @@ Boot_DeclareIndexStmt:
 								$4,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
@@ -358,6 +359,7 @@ Boot_DeclareUniqueIndexStmt:
 								$5,
 								InvalidOid,
 								InvalidOid,
+								-1,
 								false,
 								false,
 								false,
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index ff48f44c66f..edb65d4cacb 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -521,6 +521,7 @@ WaitForOlderSnapshots(TransactionId limitXmin, bool progress)
  *		case the caller had better have checked it earlier.
  * 'skip_build': make the catalog entries but don't create the index files
  * 'quiet': suppress the NOTICE chatter ordinarily provided for constraints.
+ * 'total_parts': total number of direct and indirect partitions
  *
  * Returns the object address of the created index.
  */
@@ -530,6 +531,7 @@ DefineIndex(Oid relationId,
 			Oid indexRelationId,
 			Oid parentIndexId,
 			Oid parentConstraintId,
+			int total_parts,
 			bool is_alter_table,
 			bool check_rights,
 			bool check_not_in_use,
@@ -1225,8 +1227,30 @@ DefineIndex(Oid relationId,
 			Relation	parentIndex;
 			TupleDesc	parentDesc;
 
-			pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
-										 nparts);
+			/*
+			 * Set the total number of partitions at the start of the command;
+			 * but don't update it when being called recursively.
+			 */
+			if (!OidIsValid(parentIndexId))
+			{
+				/*
+				 * When called by ProcessUtilitySlow(), the number of
+				 * partitions is passed in as an optimization.  This should
+				 * count partitions the same way.  Subtract one since
+				 * find_all_inheritors() includes the rel itself.
+				 */
+				if (total_parts < 0)
+				{
+					List	   *childs = find_all_inheritors(relationId,
+															 NoLock, NULL);
+
+					total_parts = list_length(childs) - 1;
+					list_free(childs);
+				}
+
+				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_TOTAL,
+											 total_parts);
+			}
 
 			/* Make a local copy of partdesc->oids[], just for safety */
 			memcpy(part_oids, partdesc->oids, sizeof(Oid) * nparts);
@@ -1432,14 +1456,23 @@ DefineIndex(Oid relationId,
 								InvalidOid, /* no predefined OID */
 								indexRelationId,	/* this is our child */
 								createdConstraintId,
+								-1,
 								is_alter_table, check_rights, check_not_in_use,
 								skip_build, quiet);
 					SetUserIdAndSecContext(child_save_userid,
 										   child_save_sec_context);
 				}
+				else
+				{
+					/*
+					 * If a pre-existing index was ATTACHed, the progress
+					 * report is updated here.  A partitioned index is
+					 * likewise counted when attached, but not its partitions,
+					 * since that's expensive, and ATTACH is fast anyway.
+					 */
+					pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
+				}
 
-				pgstat_progress_update_param(PROGRESS_CREATEIDX_PARTITIONS_DONE,
-											 i + 1);
 				free_attrmap(attmap);
 			}
 
@@ -1479,6 +1512,16 @@ DefineIndex(Oid relationId,
 		table_close(rel, NoLock);
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+		{
+			/*
+			 * Update progress for a partitioned index itself; the
+			 * recursively-called function will have updated the counter for
+			 * its child indexes.
+			 */
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
+		}
+
 		return address;
 	}
 
@@ -1490,9 +1533,16 @@ DefineIndex(Oid relationId,
 		/* Close the heap and we're done, in the non-concurrent case */
 		table_close(rel, NoLock);
 
-		/* If this is the top-level index, we're done. */
+		/*
+		 * If this is the top-level index, the command is done. When called
+		 * recursively for child tables, the done partition counter is
+		 * incremented now, rather than in the caller, to provide fine-grained
+		 * progress reporting in the case of intermediate partitioning.
+		 */
 		if (!OidIsValid(parentIndexId))
 			pgstat_progress_end_command();
+		else
+			pgstat_progress_incr_param(PROGRESS_CREATEIDX_PARTITIONS_DONE, 1);
 
 		return address;
 	}
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3e2c5f797cd..cc74d703f91 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1216,6 +1216,7 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId,
 						InvalidOid,
 						RelationGetRelid(idxRel),
 						constraintOid,
+						-1,
 						false, false, false, false, false);
 
 			index_close(idxRel, AccessShareLock);
@@ -8640,6 +8641,7 @@ ATExecAddIndex(AlteredTableInfo *tab, Relation rel,
 						  InvalidOid,	/* no predefined OID */
 						  InvalidOid,	/* no parent index */
 						  InvalidOid,	/* no parent constraint */
+						  -1,			/* total_parts unknown */
 						  true, /* is_alter_table */
 						  check_rights,
 						  false,	/* check_not_in_use - we did it already */
@@ -18105,7 +18107,7 @@ AttachPartitionEnsureIndexes(Relation rel, Relation attachrel)
 										   &conOid);
 			DefineIndex(RelationGetRelid(attachrel), stmt, InvalidOid,
 						RelationGetRelid(idxRel),
-						conOid,
+						conOid, -1,
 						true, false, false, false, false);
 		}
 
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index eada7353639..2dc92c72cea 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1465,6 +1465,7 @@ ProcessUtilitySlow(ParseState *pstate,
 					Oid			relid;
 					LOCKMODE	lockmode;
 					bool		is_alter_table;
+					int			nparts = 0;
 
 					if (stmt->concurrent)
 						PreventInTransactionBlock(isTopLevel,
@@ -1503,9 +1504,11 @@ ProcessUtilitySlow(ParseState *pstate,
 						List	   *inheritors = NIL;
 
 						inheritors = find_all_inheritors(relid, lockmode, NULL);
+						nparts = list_length(inheritors) - 1;
 						foreach(lc, inheritors)
 						{
-							char		relkind = get_rel_relkind(lfirst_oid(lc));
+							Oid			partrelid = lfirst_oid(lc);
+							char		relkind = get_rel_relkind(partrelid);
 
 							if (relkind != RELKIND_RELATION &&
 								relkind != RELKIND_MATVIEW &&
@@ -1548,6 +1551,7 @@ ProcessUtilitySlow(ParseState *pstate,
 									InvalidOid, /* no predefined OID */
 									InvalidOid, /* no parent index */
 									InvalidOid, /* no parent constraint */
+									nparts,
 									is_alter_table,
 									true,	/* check_rights */
 									true,	/* check_not_in_use */
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index d96af812b19..2a9994b98fd 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -58,6 +58,34 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*-----------
+ * pgstat_progress_incr_param() -
+ *
+ * Increment index'th member in st_progress_param[] of the current backend.
+ *-----------
+ */
+void
+pgstat_progress_incr_param(int index, int64 incr)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	int64		val;
+
+	Assert(index >= 0 && index < PGSTAT_NUM_PROGRESS_PARAM);
+
+	if (!beentry || !pgstat_track_activities)
+		return;
+
+	/*
+	 * Because no other process should write to this backend's own status, we
+	 * can read its value from shared memory without needing to loop to ensure
+	 * its consistency.
+	 */
+	val = beentry->st_progress_param[index];
+	val += incr;
+
+	pgstat_progress_update_param(index, val);
+}
+
 /*-----------
  * pgstat_progress_update_multi_param() -
  *
diff --git a/src/include/commands/defrem.h b/src/include/commands/defrem.h
index 4f7f87fc62c..478203ed4c4 100644
--- a/src/include/commands/defrem.h
+++ b/src/include/commands/defrem.h
@@ -29,6 +29,7 @@ extern ObjectAddress DefineIndex(Oid relationId,
 								 Oid indexRelationId,
 								 Oid parentIndexId,
 								 Oid parentConstraintId,
+								 int total_parts,
 								 bool is_alter_table,
 								 bool check_rights,
 								 bool check_not_in_use,
diff --git a/src/include/utils/backend_progress.h b/src/include/utils/backend_progress.h
index 005e5d75ab6..a84752ade99 100644
--- a/src/include/utils/backend_progress.h
+++ b/src/include/utils/backend_progress.h
@@ -36,6 +36,7 @@ typedef enum ProgressCommandType
 extern void pgstat_progress_start_command(ProgressCommandType cmdtype,
 										  Oid relid);
 extern void pgstat_progress_update_param(int index, int64 val);
+extern void pgstat_progress_incr_param(int index, int64 incr);
 extern void pgstat_progress_update_multi_param(int nparam, const int *index,
 											   const int64 *val);
 extern void pgstat_progress_end_command(void);
-- 
2.34.1

0002-assertions-for-progress-reporting.patchtext/x-diff; charset=us-asciiDownload
From b9bfc805aa17e07ab7a92b6478daa9ff09b2b8e6 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 1 Feb 2023 10:23:53 -0600
Subject: [PATCH 2/3] assertions for progress reporting

---
 src/backend/commands/analyze.c                | 10 ++-
 src/backend/utils/activity/backend_progress.c | 84 +++++++++++++++++++
 2 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 65750958bb2..3bfc941aa2c 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1145,6 +1145,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	TableScanDesc scan;
 	BlockNumber nblocks;
 	BlockNumber blksdone = 0;
+	int64		progress_vals[2] = {0};
+	int const	progress_inds[2] = {
+		PROGRESS_ANALYZE_BLOCKS_DONE,
+		PROGRESS_ANALYZE_BLOCKS_TOTAL
+	};
+
 #ifdef USE_PREFETCH
 	int			prefetch_maximum = 0;	/* blocks to prefetch if enabled */
 	BlockSamplerData prefetch_bs;
@@ -1169,8 +1175,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 #endif
 
 	/* Report sampling block numbers */
-	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
-								 nblocks);
+	progress_vals[1] = nblocks;
+	pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 2a9994b98fd..63f9482b175 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -10,6 +10,7 @@
  */
 #include "postgres.h"
 
+#include "commands/progress.h"
 #include "port/atomics.h"		/* for memory barriers */
 #include "utils/backend_progress.h"
 #include "utils/backend_status.h"
@@ -37,6 +38,85 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
 }
 
+/*
+ * Check for consistency of progress data (current < total).
+ *
+ * Check during pgstat_progress_updates_*() rather than only from
+ * pgstat_progress_end_command() to catch issues with uninitialized/stale data
+ * from previous progress commands.
+ *
+ * If a command fails due to interrupt or error, the values may be less than
+ * the expected final value.
+ */
+static void
+pgstat_progress_asserts(void)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+	volatile int64 *a = beentry->st_progress_param;
+
+	switch (beentry->st_progress_command)
+	{
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_HEAP_BLKS_VACUUMED] <=
+				   a[PROGRESS_VACUUM_TOTAL_HEAP_BLKS]);
+			Assert(a[PROGRESS_VACUUM_NUM_DEAD_TUPLES] <=
+				   a[PROGRESS_VACUUM_MAX_DEAD_TUPLES]);
+			break;
+
+		case PROGRESS_COMMAND_ANALYZE:
+			Assert(a[PROGRESS_ANALYZE_BLOCKS_DONE] <=
+				   a[PROGRESS_ANALYZE_BLOCKS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_EXT_STATS_COMPUTED] <=
+				   a[PROGRESS_ANALYZE_EXT_STATS_TOTAL]);
+			Assert(a[PROGRESS_ANALYZE_CHILD_TABLES_DONE] <=
+				   a[PROGRESS_ANALYZE_CHILD_TABLES_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			Assert(a[PROGRESS_CLUSTER_HEAP_BLKS_SCANNED] <=
+				   a[PROGRESS_CLUSTER_TOTAL_HEAP_BLKS]);
+			/* FALLTHROUGH */
+			/* ..because CLUSTER rebuilds indexes */
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			Assert(a[PROGRESS_CREATEIDX_TUPLES_DONE] <=
+				   a[PROGRESS_CREATEIDX_TUPLES_TOTAL]);
+			Assert(a[PROGRESS_CREATEIDX_PARTITIONS_DONE] <=
+				   a[PROGRESS_CREATEIDX_PARTITIONS_TOTAL]);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			/* progress reporting is optional for these */
+			if (a[PROGRESS_BASEBACKUP_BACKUP_TOTAL] >= 0)
+			{
+				Assert(a[PROGRESS_BASEBACKUP_BACKUP_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_BACKUP_TOTAL]);
+				Assert(a[PROGRESS_BASEBACKUP_TBLSPC_STREAMED] <=
+					   a[PROGRESS_BASEBACKUP_TBLSPC_TOTAL]);
+			}
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+#if 0
+			//This currently fails file_fdw tests, since pgstat_prorgress evidently fails
+				// to support simultaneous copy commands, as happens during JOIN.
+			/* bytes progress is not available in all cases */
+				if (a[PROGRESS_COPY_BYTES_TOTAL] > 0)
+				//Assert(a[PROGRESS_COPY_BYTES_PROCESSED] <= a[PROGRESS_COPY_BYTES_TOTAL]);
+			if (a[PROGRESS_COPY_BYTES_PROCESSED] > a[PROGRESS_COPY_BYTES_TOTAL])
+				elog(WARNING, "PROGRESS_COPY_BYTES_PROCESSED %ld %ld",
+					 a[PROGRESS_COPY_BYTES_PROCESSED],
+					 a[PROGRESS_COPY_BYTES_TOTAL]);
+#endif
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;				/* Do nothing */
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -56,6 +136,8 @@ pgstat_progress_update_param(int index, int64 val)
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
@@ -113,6 +195,8 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
 	}
 
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
+
+	pgstat_progress_asserts();
 }
 
 /*-----------
-- 
2.34.1

0003-f-also-assert-that-progress-values-don-t-go-backward.patchtext/x-diff; charset=us-asciiDownload
From 68403c5123a0b475affbf89626f7364adce1df24 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Sat, 21 Jan 2023 21:41:04 -0600
Subject: [PATCH 3/3] f! also assert that progress values don't go backwards
 and the total is constant

pgstat_progress_update_multi_param() is being abused as a way to update
a value which might otherwise violate the assertion.

See also:
https://www.postgresql.org/message-id/CA%2BTgmoYSvEP3weQaCPGf6%2BDXLy2__JbJUYtoXyWP%3DqHcyGbihA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c          | 37 +++++++++
 src/backend/storage/lmgr/lmgr.c               | 24 +++---
 src/backend/utils/activity/backend_progress.c | 81 +++++++++++++++++++
 3 files changed, 130 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 8f14cf85f38..61cfbf6a17a 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1042,6 +1042,13 @@ lazy_scan_heap(LVRelState *vacrel)
 
 				/* Forget the LP_DEAD items that we just vacuumed */
 				dead_items->num_items = 0;
+				{
+					const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+					const int64 progress_vals[] = {0};
+
+					pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+				}
+
 
 				/*
 				 * Periodically perform FSM vacuuming to make newly-freed
@@ -2199,6 +2206,13 @@ lazy_vacuum(LVRelState *vacrel)
 	{
 		Assert(!vacrel->do_index_cleanup);
 		vacrel->dead_items->num_items = 0;
+		{
+			const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+			const int64 progress_vals[] = {0};
+
+			pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+		}
+
 		return;
 	}
 
@@ -2301,6 +2315,13 @@ lazy_vacuum(LVRelState *vacrel)
 	 * vacuum)
 	 */
 	vacrel->dead_items->num_items = 0;
+
+	{
+		const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
+
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 }
 
 /*
@@ -2414,12 +2435,23 @@ lazy_vacuum_heap_rel(LVRelState *vacrel)
 	BlockNumber vacuumed_pages = 0;
 	Buffer		vmbuffer = InvalidBuffer;
 	LVSavedErrInfo saved_err_info;
+#if 0
+	const int	progress_inds[] = {
+		PROGRESS_VACUUM_PHASE,
+		PROGRESS_VACUUM_NUM_DEAD_TUPLES,
+	};
+	const int64 progress_vals[] = {
+		PROGRESS_VACUUM_PHASE_VACUUM_HEAP,
+		0,
+	};
+#endif
 
 	Assert(vacrel->do_index_vacuuming);
 	Assert(vacrel->do_index_cleanup);
 	Assert(vacrel->num_index_scans > 0);
 
 	/* Report that we are now vacuuming the heap */
+	//pgstat_progress_update_multi_param(2, progress_inds, progress_vals);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
 
@@ -3190,7 +3222,12 @@ dead_items_alloc(LVRelState *vacrel, int nworkers)
 	dead_items = (VacDeadItems *) palloc(vac_max_items_to_alloc_size(max_items));
 	dead_items->max_items = max_items;
 	dead_items->num_items = 0;
+	{
+		const int	progress_inds[] = {PROGRESS_VACUUM_NUM_DEAD_TUPLES};
+		const int64 progress_vals[] = {0};
 
+		pgstat_progress_update_multi_param(1, progress_inds, progress_vals);
+	}
 	vacrel->dead_items = dead_items;
 }
 
diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c
index ee9b89a6726..a3acdce5ff5 100644
--- a/src/backend/storage/lmgr/lmgr.c
+++ b/src/backend/storage/lmgr/lmgr.c
@@ -912,6 +912,13 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	int			total = 0;
 	int			done = 0;
 
+	const int	progress_index[] = {
+		PROGRESS_WAITFOR_TOTAL,
+		PROGRESS_WAITFOR_DONE,
+		PROGRESS_WAITFOR_CURRENT_PID
+	};
+	const int64 progress_values[] = {0, 0, 0};
+
 	/* Done if no locks to wait for */
 	if (locktags == NIL)
 		return;
@@ -930,7 +937,10 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 	}
 
 	if (progress)
+	{
+		pgstat_progress_update_multi_param(3, progress_index, progress_values);
 		pgstat_progress_update_param(PROGRESS_WAITFOR_TOTAL, total);
+	}
 
 	/*
 	 * Note: GetLockConflicts() never reports our own xid, hence we need not
@@ -960,19 +970,9 @@ WaitForLockersMultiple(List *locktags, LOCKMODE lockmode, bool progress)
 				pgstat_progress_update_param(PROGRESS_WAITFOR_DONE, ++done);
 		}
 	}
+
 	if (progress)
-	{
-		const int	index[] = {
-			PROGRESS_WAITFOR_TOTAL,
-			PROGRESS_WAITFOR_DONE,
-			PROGRESS_WAITFOR_CURRENT_PID
-		};
-		const int64 values[] = {
-			0, 0, 0
-		};
-
-		pgstat_progress_update_multi_param(3, index, values);
-	}
+		pgstat_progress_update_multi_param(3, progress_index, progress_values);
 
 	list_free_deep(holders);
 }
diff --git a/src/backend/utils/activity/backend_progress.c b/src/backend/utils/activity/backend_progress.c
index 63f9482b175..41ad6884521 100644
--- a/src/backend/utils/activity/backend_progress.c
+++ b/src/backend/utils/activity/backend_progress.c
@@ -117,6 +117,78 @@ pgstat_progress_asserts(void)
 	}
 }
 
+/*
+ * Check that newval >= oldval, and that when total is not being set twice.
+ */
+static void
+pgstat_progress_assert_forward_progress(int command, int index,
+										int64 oldval, int64 newval)
+{
+	switch (command)
+	{
+		case PROGRESS_COMMAND_ANALYZE:
+			/*
+			 * phase goes backwards for inheritance tables, which are sampled
+			 * twice
+			 */
+			if (index != PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID &&
+				index != PROGRESS_ANALYZE_PHASE)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_ANALYZE_BLOCKS_TOTAL ||
+				index == PROGRESS_ANALYZE_EXT_STATS_TOTAL ||
+				index == PROGRESS_ANALYZE_CHILD_TABLES_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CLUSTER:
+			if (index != PROGRESS_CLUSTER_INDEX_RELID)
+				Assert(newval >= oldval);
+			if (index == PROGRESS_CLUSTER_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_CREATE_INDEX:
+			if (index != PROGRESS_CREATEIDX_INDEX_OID &&
+				index != PROGRESS_CREATEIDX_SUBPHASE &&
+				index != PROGRESS_WAITFOR_CURRENT_PID &&
+				index != PROGRESS_CREATEIDX_ACCESS_METHOD_OID)
+				Assert(newval >= oldval);
+
+			if (index == PROGRESS_CREATEIDX_TUPLES_TOTAL ||
+				index == PROGRESS_CREATEIDX_PARTITIONS_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_BASEBACKUP:
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL &&
+				oldval == 0 && newval == -1)
+				return;			/* Do nothing: this is the initial "null"
+								 * state before the size is estimated */
+			Assert(newval >= oldval);
+
+			if (index == PROGRESS_BASEBACKUP_BACKUP_TOTAL ||
+				index == PROGRESS_BASEBACKUP_TBLSPC_TOTAL)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_COPY:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_COPY_BYTES_TOTAL)
+				Assert(oldval == 0);
+			break;
+		case PROGRESS_COMMAND_VACUUM:
+			Assert(newval >= oldval);
+			if (index == PROGRESS_VACUUM_TOTAL_HEAP_BLKS)
+				Assert(oldval == 0);
+			if (index == PROGRESS_VACUUM_MAX_DEAD_TUPLES)
+				Assert(oldval == 0);
+			break;
+
+		case PROGRESS_COMMAND_INVALID:
+			break;
+	}
+}
+
 /*-----------
  * pgstat_progress_update_param() -
  *
@@ -133,6 +205,15 @@ pgstat_progress_update_param(int index, int64 val)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	if (index != PROGRESS_SCAN_BLOCKS_DONE)
+	{
+		/* Check that progress does not go backwards */
+		int64		oldval = beentry->st_progress_param[index];
+
+		pgstat_progress_assert_forward_progress(beentry->st_progress_command,
+												index, oldval, val);
+	}
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
 	beentry->st_progress_param[index] = val;
 	PGSTAT_END_WRITE_ACTIVITY(beentry);
-- 
2.34.1

#43Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#42)
Re: Progress report of CREATE INDEX for nested partitioned tables

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:

Furthermore, as things stand it's not hard
for PARTITIONS_TOTAL to be zero --- there's at least one such case
in the regression tests --- and that seems just weird to me.

I don't know why it'd seem weird. postgres doesn't create partitions
automatically, so by default there are none. If we create a table but
never load any data, it'll have no partitions.

My problem with it is that it's not clear how to tell "no partitioned
index creation in progress" from "partitioned index creation in progress,
but total = 0". Maybe there's some out-of-band way to tell that in the
stats reporting system, but still it's a weird corner case.

Also, the TOTAL=0 case
won't go away just because we start counting intermediate partitions.

That's why I wanted list_length() not list_length() - 1. We are
doing *something* at the top partitioned table, it just doesn't
involve a table scan, so I don't find this totally unreasonable.
If you agree we are doing work at intermediate partitioned tables,
how are we not doing work at the top one?

regards, tom lane

#44Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#43)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote:

Justin Pryzby <pryzby@telsasoft.com> writes:

On Thu, Mar 23, 2023 at 04:35:46PM -0400, Tom Lane wrote:

Furthermore, as things stand it's not hard
for PARTITIONS_TOTAL to be zero --- there's at least one such case
in the regression tests --- and that seems just weird to me.

I don't know why it'd seem weird. postgres doesn't create partitions
automatically, so by default there are none. If we create a table but
never load any data, it'll have no partitions.

My problem with it is that it's not clear how to tell "no partitioned
index creation in progress" from "partitioned index creation in progress,
but total = 0". Maybe there's some out-of-band way to tell that in the
stats reporting system, but still it's a weird corner case.

Also, the TOTAL=0 case
won't go away just because we start counting intermediate partitions.

That's why I wanted list_length() not list_length() - 1. We are
doing *something* at the top partitioned table, it just doesn't
involve a table scan, so I don't find this totally unreasonable.
If you agree we are doing work at intermediate partitioned tables,
how are we not doing work at the top one?

What you're proposing would redefine the meaning of
PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned
tables. Which might be okay, but the scope of this thread/patch was to
fix the behavior involving intermediate partitioned tables.

It's somewhat weird to me that find_all_inheritors(rel) returns the rel
itself. But it's an internal function, and evidently that's what's
needed/desirable to do, so that's fine.

However, "PARTITIONS_TOTAL" has a certain user-facing definition, and
"Number of partitions" is easier to explain than "Number of partitions
plus the rel itself", and IMO an easier definition is a better one.

Your complaint seems similar to something I've said a few times before:
it's weird to expose macroscopic progress reporting of partitioned
tables in the same view and in the same *row* as microscopic progress of
its partitions. But changing that is a job for another patch. I won't
be opposed to it if someone were to propose a patch to remove
partitions_{done,total}. See also:
/messages/by-id/YCy5ZMt8xAyoOMmv@paquier.xyz

--
Justin

#45Tom Lane
tgl@sss.pgh.pa.us
In reply to: Justin Pryzby (#44)
Re: Progress report of CREATE INDEX for nested partitioned tables

Justin Pryzby <pryzby@telsasoft.com> writes:

On Sat, Mar 25, 2023 at 11:55:13AM -0400, Tom Lane wrote:

That's why I wanted list_length() not list_length() - 1. We are
doing *something* at the top partitioned table, it just doesn't
involve a table scan, so I don't find this totally unreasonable.
If you agree we are doing work at intermediate partitioned tables,
how are we not doing work at the top one?

What you're proposing would redefine the meaning of
PARTITIONS_DONE/TOTAL, even in the absence of intermediate partitioned
tables. Which might be okay, but the scope of this thread/patch was to
fix the behavior involving intermediate partitioned tables.

I'm a little skeptical of that argument, because this patch is already
redefining the meaning of PARTITIONS_TOTAL. The fact that the existing
documentation is vague enough to be read either way doesn't make it not
a change.

Still, in the interests of getting something done I'll drop the issue.

regards, tom lane

#46Tom Lane
tgl@sss.pgh.pa.us
In reply to: Tom Lane (#45)
Re: Progress report of CREATE INDEX for nested partitioned tables

I pushed 0001 with some cosmetic changes (for instance, trying to
make the style of the doc entries for partitions_total/partitions_done
match the rest of their table).

I'm not touching 0002 or 0003, because I think they're fundamentally
a bad idea. Progress reporting is inherently inexact, because it's
so hard to predict the amount of work to be done in advance -- have
you ever seen a system anywhere whose progress bars reliably advance
at a uniform rate? I think adding assertions that the estimates are
error-free is just going to cause headaches. As an example, I added
a comment pointing out that the current fix won't crash and burn if
the caller failed to lock all the child tables in advance: the
find_all_inheritors call should be safe anyway, so the worst consequence
would be an imprecise partitions_total estimate. But that argument
falls down if we're going to add assertions that partitions_total
isn't in error.

regards, tom lane

#47Justin Pryzby
pryzby@telsasoft.com
In reply to: Tom Lane (#46)
Re: Progress report of CREATE INDEX for nested partitioned tables

On Sat, Mar 25, 2023 at 03:43:32PM -0400, Tom Lane wrote:

I pushed 0001 with some cosmetic changes (for instance, trying to
make the style of the doc entries for partitions_total/partitions_done
match the rest of their table).

Thanks.

I'm not touching 0002 or 0003, because I think they're fundamentally
a bad idea. Progress reporting is inherently inexact, because it's

Nobody could disagree that it's inexact. The assertions are for minimal
sanity tests and consistency. Like if "total" is set multiple times (as
in this patch), or if a progress value goes backwards. Anyway the
assertions exposed two other issues that would need to be fixed before
the assertions themselves could be proposed.

--
Justin