progress report for ANALYZE

Started by Alvaro Herreraover 6 years ago64 messages

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

1 attachment(s)

Hello

Here's a patch that implements progress reporting for ANALYZE.

Thanks

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v1-0001-Report-progress-for-ANALYZE.patchtext/x-diff; charset=us-asciiDownload

From 572fea555bc1c9d3512f7806aed6b97a374199eb Mon Sep 17 00:00:00 2001
From: Alvaro Herrera <alvherre@alvh.no-ip.org>
Date: Fri, 21 Jun 2019 13:35:13 -0400
Subject: [PATCH v1] Report progress for ANALYZE

---
 doc/src/sgml/monitoring.sgml         | 129 +++++++++++++++++++++++++++
 src/backend/catalog/system_views.sql |  15 ++++
 src/backend/commands/analyze.c       |  55 +++++++++++-
 src/backend/utils/adt/pgstatfuncs.c  |   2 +
 src/backend/utils/misc/sampling.c    |   6 +-
 src/include/commands/progress.h      |  14 +++
 src/include/pgstat.h                 |   1 +
 src/include/utils/sampling.h         |   2 +-
 src/test/regress/expected/rules.out  |  17 ++++
 9 files changed, 237 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..a924ab3b06 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
@@ -3927,6 +3935,127 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for ech backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether the current scan includes legacy inheritance children.</entry>
+     </row>
+     <row>
+      <entry><structfield>scanning_table</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>
+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>heap_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks to scan in the current table.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>scanning table</literal></entry>
+     <entry>
+       The command is currently scanning the table.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>analyzing sample</literal></entry>
+     <entry>
+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>analyzing sample (extended)</literal></entry>
+     <entry>
+       <command>ANALYZE</command> is currently extracting statistical data
+       for extended statistics from the sample obtained.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..e497f59790 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,21 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS scanning_table,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'scanning table'
+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'
+                      END AS phase,
+        S.param4 AS heap_blks_total, S.param5 AS heap_blks_scanned
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 6cb545c126..0a75549440 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -493,6 +498,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_INH
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_SCAN_TABLE,
+			inh
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_BLOCKS_DONE
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_ANALYSIS,
+			0, 0
+		};
+
+		pgstat_progress_update_multi_param(3, index, val);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -574,10 +602,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		}
 
 		/* Build extended statistics (if there are any). */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ANALYSIS_EXTENDED);
 		BuildRelationExtStatistics(onerel, totalrows, numrows, rows, attr_cnt,
 								   vacattrstats);
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_COMPLETE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1016,6 +1049,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	double		blksdone = 0;
 
 	Assert(targrows > 0);
 
@@ -1025,7 +1060,20 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_SCANREL
+		};
+		const int64 val[] = {
+			nblocks,
+			RelationGetRelid(onerel)
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1086,6 +1134,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537979..f7daece5ee 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313cb3..3007351922 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INH					1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE		1
+#define PROGRESS_ANALYZE_PHASE_ANALYSIS			2
+#define PROGRESS_ANALYZE_PHASE_ANALYSIS_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_COMPLETE			4
+
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a188..9344654b69 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -956,6 +956,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507fb5..76d31dc126 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..dd3191563d 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,23 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS scanning_table,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'analyzing sample'::text
+            WHEN 3 THEN 'analyzing sample (extended stats)'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param4 AS heap_blks_total,
+    s.param5 AS heap_blks_scanned
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,
-- 
2.17.1

Julien Rouhaud

rjuju123@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#1)

Re: progress report for ANALYZE

On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Here's a patch that implements progress reporting for ANALYZE.

Patch applies, code and doc and compiles cleanly. I have few comments:

@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
    if (numrows > 0)
    {
        MemoryContext col_context,
-                   old_context;
+                     old_context;
+       const int   index[] = {
+           PROGRESS_ANALYZE_PHASE,
+           PROGRESS_ANALYZE_TOTAL_BLOCKS,
+           PROGRESS_ANALYZE_BLOCKS_DONE
+       };
+       const int64 val[] = {
+           PROGRESS_ANALYZE_PHASE_ANALYSIS,
+           0, 0
+       };
+
+       pgstat_progress_update_multi_param(3, index, val);
[...]
    }
+   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                PROGRESS_ANALYZE_PHASE_COMPLETE);
+
If there wasn't any row but multiple blocks were scanned, the
PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
the blocks that were scanned.  I'm not sure if we should stay
consistent here.

diff --git a/src/backend/utils/adt/pgstatfuncs.c
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
    /* Translate command name into command type code. */
    if (pg_strcasecmp(cmd, "VACUUM") == 0)
        cmdtype = PROGRESS_COMMAND_VACUUM;
+   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+       cmdtype = PROGRESS_COMMAND_ANALYZE;
    else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
        cmdtype = PROGRESS_COMMAND_CLUSTER;
    else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)

it should be an "else if" here.

Everything else LGTM.

Anthony Nowocien

anowocien@gmail.com

over 6 years ago

In reply to: Julien Rouhaud (#2)

Re: progress report for ANALYZE

Hi,
In monitoring.sgml, "a" is missing in "row for ech backend that is
currently running that command[...]".
Anthony

On Tuesday, July 2, 2019, Julien Rouhaud <rjuju123@gmail.com> wrote:

On Fri, Jun 21, 2019 at 8:52 PM Alvaro Herrera <alvherre@2ndquadrant.com>

wrote:

Show quoted text

Here's a patch that implements progress reporting for ANALYZE.

Patch applies, code and doc and compiles cleanly. I have few comments:

@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
if (numrows > 0)
{
MemoryContext col_context,
-                   old_context;
+                     old_context;
+       const int   index[] = {
+           PROGRESS_ANALYZE_PHASE,
+           PROGRESS_ANALYZE_TOTAL_BLOCKS,
+           PROGRESS_ANALYZE_BLOCKS_DONE
+       };
+       const int64 val[] = {
+           PROGRESS_ANALYZE_PHASE_ANALYSIS,
+           0, 0
+       };
+
+       pgstat_progress_update_multi_param(3, index, val);
[...]
}
+   pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+                                PROGRESS_ANALYZE_PHASE_COMPLETE);
+
If there wasn't any row but multiple blocks were scanned, the
PROGRESS_ANALYZE_PHASE_COMPLETE will still show the informations about
the blocks that were scanned.  I'm not sure if we should stay
consistent here.

diff --git a/src/backend/utils/adt/pgstatfuncs.c
b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..98b01e54fa 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
/* Translate command name into command type code. */
if (pg_strcasecmp(cmd, "VACUUM") == 0)
cmdtype = PROGRESS_COMMAND_VACUUM;
+   if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+       cmdtype = PROGRESS_COMMAND_ANALYZE;
else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
cmdtype = PROGRESS_COMMAND_CLUSTER;
else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)

it should be an "else if" here.

Everything else LGTM.

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Alvaro Herrera (#1)

Re: progress report for ANALYZE

Hi Alvaro!

On 2019/06/22 3:52, Alvaro Herrera wrote:

Hello

Here's a patch that implements progress reporting for ANALYZE.

Sorry for the late reply.
My email address was changed to tatsuro.yamada.tf@nttcom.co.jp.

I have a question about your patch.
My ex-colleague Vinayak created same patch in 2017 [1]ANALYZE command progress checker /messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com, and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

However, actually, I think it's okay because the feature is useful
for DBAs, even if your patch can't handle Foreign table.

I'll review your patch in this week. :)

[1]: ANALYZE command progress checker /messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com
/messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com

Regards,
Tatsuro Yamada

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Tatsuro Yamada (#4)

Re: progress report for ANALYZE

Hi Alvaro,

I'll review your patch in this week. :)

I tested your patch on 6b854896.
Here is a result. See below:

---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;

[Session #2]
\a
\t
select * from pg_stat_progress_analyze; \watch 0.001

17520|13599|postgres|16387|f|16387|scanning table|4425|14
17520|13599|postgres|16387|f|16387|scanning table|4425|64
17520|13599|postgres|16387|f|16387|scanning table|4425|111
...
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0 <-- Is it Okay??
---------------------------------------------------------

I have a question of the last line of the result.
I'm not sure it is able or not, but I guess it would be better
to keep the phase name (analyzing sample) on the view until the
end of this command. :)

Regards,
Tatsuro Yamada

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Tatsuro Yamada (#5)

Re: progress report for ANALYZE

On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0 <-- Is it Okay??

Why do we zero out the block numbers when we switch phases? The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Robert Haas (#6)

Re: progress report for ANALYZE

On 2019-Jul-08, Robert Haas wrote:

On Mon, Jul 8, 2019 at 5:29 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

17520|13599|postgres|16387|f|16387|scanning table|4425|4425
17520|13599|postgres|16387|f|16387|analyzing sample|0|0
17520|13599|postgres|16387|f|16387||0|0 <-- Is it Okay??

Why do we zero out the block numbers when we switch phases? The
CREATE INDEX progress reporting patch does that kind of thing too, and
it seems like poor behavior to me.

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#7)

Re: progress report for ANALYZE

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Julien Rouhaud

rjuju123@gmail.com

over 6 years ago

In reply to: Robert Haas (#8)

Re: progress report for ANALYZE

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.

Note that this patch is already behaving like that if the table only
contains dead rows.

#10

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Julien Rouhaud (#9)

1 attachment(s)

Re: progress report for ANALYZE

Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the end.

+1

Note that this patch is already behaving like that if the table only
contains dead rows.

I fixed the patch including:

- Replace "if" to "else if". (Suggested by Julien)
- Fix typo s/ech/each/. (Suggested by Anthony)
- Add Phase "analyzing complete" in the pgstat view. (Suggested by Julien, Robert and me)
It was overlooked to add it in system_views.sql.

I share my re-test result, see below:

---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;

[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.001

3785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and fixed. :)
---------------------------------------------------------

Thanks,
Tatsuro Yamada

Attachments:

v2-0001-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v2-0001-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..c368444 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -361,6 +361,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </row>
 
      <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
+     <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
        <command>CLUSTER</command> or <command>VACUUM FULL</command>, showing current progress.
@@ -3927,6 +3935,134 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether the current scan includes legacy inheritance children.</entry>
+     </row>
+     <row>
+      <entry><structfield>scanning_table</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>
+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>heap_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks to scan in the current table.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>scanning table</literal></entry>
+     <entry>
+       The command is currently scanning the table.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>analyzing sample</literal></entry>
+     <entry>
+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>analyzing sample (extended)</literal></entry>
+     <entry>
+       <command>ANALYZE</command> is currently extracting statistical data
+       for extended statistics from the sample obtained.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>analyzing complete</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..d321fb9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS scanning_table,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'scanning table'
+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'
+                      WHEN 4 THEN 'analyzing complete'
+                      END AS phase,
+        S.param4 AS heap_blks_total, S.param5 AS heap_blks_scanned
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d7004e5..445164e 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -493,6 +498,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_INH
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_SCAN_TABLE,
+			inh
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -512,7 +529,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_BLOCKS_DONE
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_ANALYSIS,
+			0, 0
+		};
+
+		pgstat_progress_update_multi_param(3, index, val);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -574,10 +602,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		}
 
 		/* Build extended statistics (if there are any). */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ANALYSIS_EXTENDED);
 		BuildRelationExtStatistics(onerel, totalrows, numrows, rows, attr_cnt,
 								   vacattrstats);
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_COMPLETE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1016,6 +1049,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	double		blksdone = 0;
 
 	Assert(targrows > 0);
 
@@ -1025,7 +1060,20 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_SCANREL
+		};
+		const int64 val[] = {
+			nblocks,
+			RelationGetRelid(onerel)
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1086,6 +1134,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..db2cc5c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537..f7daece 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313..3007351 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INH					1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE		1
+#define PROGRESS_ANALYZE_PHASE_ANALYSIS			2
+#define PROGRESS_ANALYZE_PHASE_ANALYSIS_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_COMPLETE			4
+
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..9344654 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -956,6 +956,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507..76d31dc 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..89fd1d1 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,24 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS scanning_table,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'analyzing sample'::text
+            WHEN 3 THEN 'analyzing sample (extended stats)'::text
+            WHEN 4 THEN 'analyzing complete'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param4 AS heap_blks_total,
+    s.param5 AS heap_blks_scanned
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#11

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Robert Haas (#8)

Re: progress report for ANALYZE

On 2019-Jul-08, Robert Haas wrote:

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Yeah, I got the impression that that was determined to be the desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

No commit, just discussion in the CREATE INDEX thread.

I propose that once a field is set, we should leave it set until the end.

Hmm, ok. In CREATE INDEX, we use the block counters multiple times. We
can leave them set until the next time we need them, I suppose.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#12

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#11)

Re: progress report for ANALYZE

On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hmm, ok. In CREATE INDEX, we use the block counters multiple times.

Why do we do that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#13

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Robert Haas (#12)

Re: progress report for ANALYZE

On 2019-Jul-10, Robert Haas wrote:

On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hmm, ok. In CREATE INDEX, we use the block counters multiple times.

Why do we do that?

Because we scan the table first, then the index, then the table again
(last two for the validation phase of CIC). We count "block numbers"
separately for each of those, and keep those counters in the same pair
of columns. I think we also do that for tuple counters in one case.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#14

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#13)

Re: progress report for ANALYZE

On Wed, Jul 10, 2019 at 9:26 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Jul-10, Robert Haas wrote:

On Tue, Jul 9, 2019 at 6:12 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

Hmm, ok. In CREATE INDEX, we use the block counters multiple times.

Why do we do that?

Because we scan the table first, then the index, then the table again
(last two for the validation phase of CIC). We count "block numbers"
separately for each of those, and keep those counters in the same pair
of columns. I think we also do that for tuple counters in one case.

Hmm. I think I would have been inclined to use different counter
numbers for table blocks and index blocks, but perhaps we don't have
room. Anyway, leaving them set until we need them again seems like the
best we can do as things stand.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#15

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 6 years ago

In reply to: Tatsuro Yamada (#10)

Re: progress report for ANALYZE

Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>

Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
wrote:

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Yeah, I got the impression that that was determined to be the
desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the
end.

+1
Note that this patch is already behaving like that if the table only
contains dead rows.

+1 from me.

I fixed the patch including:

- Replace "if" to "else if". (Suggested by Julien)
- Fix typo s/ech/each/. (Suggested by Anthony)
- Add Phase "analyzing complete" in the pgstat view. (Suggested by
- Julien, Robert and me)
It was overlooked to add it in system_views.sql.

I share my re-test result, see below:

---------------------------------------------------------
[Session #1]
create table hoge as select * from generate_series(1, 1000000) a;
analyze verbose hoge;

[Session #2]
\a \t
select * from pg_stat_progress_analyze; \watch 0.001

3785|13599|postgres|16384|f|16384|scanning table|4425|6
3785|13599|postgres|16384|f|16384|scanning table|4425|31
3785|13599|postgres|16384|f|16384|scanning table|4425|70
3785|13599|postgres|16384|f|16384|scanning table|4425|109
...
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|scanning table|4425|4425
3785|13599|postgres|16384|f|16384|analyzing sample|0|0
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------

I have some comments.

+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_BLOCKS_DONE
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_ANALYSIS,
+			0, 0

Do you oppose to leaving the total/done blocks alone here:p?

+ BlockNumber nblocks;
+ double blksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:

+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);

It is converted into int64.

+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'

I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:

+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'

+ WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:

+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'

+ <entry>Process ID of backend.</entry>

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.

+      <entry>OID of the database to which this backend is connected.</entry>
+      <entry>Name of the database to which this backend is connected.</entry>

"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)

+ <entry>Whether the current scan includes legacy inheritance children.</entry>

This apparently excludes partition tables but actually it is
included.

"Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..

+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).

Is <literal> needed? And I think the parentheses are not needed.

OID of the table currently being scanned. Can differ from relid
when analyzing tables that have child tables.

+ Total number of heap blocks to scan in the current table.

Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".

+ The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.

+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.

Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.

regards.

-
Kyotaro Horiguchi
NTT Open Source Software Center

#16

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Kyotaro Horiguchi (#15)

1 attachment(s)

Re: progress report for ANALYZE

Hi Horiguchi-san!

On 2019/07/11 19:56, Kyotaro Horiguchi wrote:

Hello.

At Tue, 9 Jul 2019 17:38:44 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <244cb241-168b-d6a9-c45f-a80c34cdc6ad@nttcom.co.jp>

Hi Alvaro, Anthony, Julien and Robert,

On 2019/07/09 3:47, Julien Rouhaud wrote:

On Mon, Jul 8, 2019 at 8:44 PM Robert Haas <robertmhaas@gmail.com>
wrote:

On Mon, Jul 8, 2019 at 2:18 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Yeah, I got the impression that that was determined to be the
desirable
behavior, so I made it do that, but I'm not really happy about it
either. We're not too late to change the CREATE INDEX behavior, but
let's discuss what is it that we want.

I don't think I intended to make any such determination -- which
commit do you think established this as the canonical behavior?

I propose that once a field is set, we should leave it set until the
end.

+1
Note that this patch is already behaving like that if the table only
contains dead rows.

+1 from me.

3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------

I have some comments.
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_BLOCKS_DONE
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_ANALYSIS,
+			0, 0
Do you oppose to leaving the total/done blocks alone here:p?

Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)

Ah, I revised it to remove "0, 0".

+ BlockNumber nblocks;
+ double blksdone = 0;

Why is it a double even though blksdone is of the same domain
with nblocks? And finally:
+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);
It is converted into int64.

Fixed.
But is it suitable to use BlockNumber instead int64?

+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'
I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+ WHEN 4 THEN 'analyzing complete'

And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:
+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'

I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

WHEN 2 THEN 'computing stats'
WHEN 3 THEN 'computing extended stats'
WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.

+ <entry>Process ID of backend.</entry>

of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.
+      <entry>OID of the database to which this backend is connected.</entry>
+      <entry>Name of the database to which this backend is connected.</entry>
"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)

I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)

+ <entry>Whether the current scan includes legacy inheritance children.</entry>

This apparently excludes partition tables but actually it is
included.

"Whether scanning through child tables" ?

I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..

Hmm... I'm also not sure but I fixed as you suggested.

+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).
Is <literal> needed? And I think the parentheses are not needed.

OID of the table currently being scanned. Can differ from relid
when analyzing tables that have child tables.

How about:
OID of the table currently being scanned.
It might be different from relid when analyzing tables that have child tables.

+ Total number of heap blocks to scan in the current table.

Number of heap blocks on scanning_table to scan?

It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".

How about:
Total number of heap blocks in the scanning_table.

+ The command is currently scanning the table.

"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.

Fixed.

+       <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.
Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.

Fixed.

Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================

Thanks,
Tatsuro Yamada

Attachments:

v3-0001-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v3-0001-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..bf09d69 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -361,6 +361,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </row>
 
      <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
+     <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
        <command>CLUSTER</command> or <command>VACUUM FULL</command>, showing current progress.
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3927,6 +3935,131 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>scanning_table</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>heap_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks in the scanning_table.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>scanning table</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>scanning_table</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..ce3416e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS scanning_table,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'scanning table'
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param4 AS heap_blks_total, S.param5 AS heap_blks_scanned
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8d633f2..19c7042 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -494,6 +499,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_INH
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_SCAN_TABLE,
+			inh
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -513,7 +530,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTING);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -575,10 +595,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		}
 
 		/* Build extended statistics (if there are any). */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED);
 		BuildRelationExtStatistics(onerel, totalrows, numrows, rows, attr_cnt,
 								   vacattrstats);
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1017,6 +1042,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	int64		blksdone = 0;
 
 	Assert(targrows > 0);
 
@@ -1026,7 +1053,20 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_SCANREL
+		};
+		const int64 val[] = {
+			nblocks,
+			RelationGetRelid(onerel)
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1087,6 +1127,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..db2cc5c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537..f7daece 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313..300bb58 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INH					1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE		1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING			2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE			4
+
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..9344654 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -956,6 +956,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507..76d31dc 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..691f46a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,24 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS scanning_table,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param4 AS heap_blks_total,
+    s.param5 AS heap_blks_scanned
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#17

Kyotaro Horiguchi

horikyota.ntt@gmail.com

over 6 years ago

In reply to: Tatsuro Yamada (#16)

Re: progress report for ANALYZE

Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>

3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------
I have some comments.
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_BLOCKS_DONE
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_ANALYSIS,
+			0, 0
Do you oppose to leaving the total/done blocks alone here:p?
Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)

Ah, I revised it to remove "0, 0".

Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

+  BlockNumber  nblocks;
+  double    blksdone = 0;
Why is it a double even though blksdone is of the same domain
with nblocks? And finally:
+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);
It is converted into int64.

Fixed.
But is it suitable to use BlockNumber instead int64?

Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.

+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'
I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'analyzing complete'
And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:
+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'

I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

WHEN 2 THEN 'computing stats'
WHEN 3 THEN 'computing extended stats'
WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.

Agreed. Especially such word choosing is not suitable for me..

+ <entry>Process ID of backend.</entry>
of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.
+ <entry>OID of the database to which this backend is
connected.</entry>
+ <entry>Name of the database to which this backend is
connected.</entry>
"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)
I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)

Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.

+ <entry>Whether the current scan includes legacy inheritance
children.</entry>
This apparently excludes partition tables but actually it is
included.

"Whether scanning through child tables" ?
I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..

Hmm... I'm also not sure but I fixed as you suggested.

Yeah, I also am not sure the suggestion is good enough as is..

+       The table being scanned (differs from <literal>relid</literal>
+       only when processing partitions or inheritance children).
Is <literal> needed? And I think the parentheses are not needed.
OID of the table currently being scanned. Can differ from relid
when analyzing tables that have child tables.
How about:
OID of the table currently being scanned.
It might be different from relid when analyzing tables that have child
tables.

+ Total number of heap blocks to scan in the current table.
Number of heap blocks on scanning_table to scan?
It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".

How about:
Total number of heap blocks in the scanning_table.

(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a

+ The command is currently scanning the table.
"sample(s)" comes somewhat abruptly in the next item. Something
like "The command is currently scanning the table
<structfield>scanning_table</structfield> to obtain samples"
might be better.

Fixed.
+ <command>ANALYZE</command> is currently extracting statistical data
+       from the sample obtained.
Something like "The command is computing stats from the samples
obtained in the previous phase" might be better.
Fixed.

Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
9067|13599|postgres|16387|f|16387|scanning table|443|44
9067|13599|postgres|16387|f|16387|scanning table|443|76
9067|13599|postgres|16387|f|16387|scanning table|443|100
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================

Looks fine!

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

#18

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Kyotaro Horiguchi (#17)

1 attachment(s)

Re: progress report for ANALYZE

Hi Horiguchi-san, Alvaro, Anthony, Julien and Robert,

On 2019/07/22 17:30, Kyotaro Horiguchi wrote:

Hello.

# It's very good timing, as you came in while I have a time after
# finishing a quite nerve-wrackig task..

At Mon, 22 Jul 2019 15:02:16 +0900, Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote in <0876b4fe-26fb-ca32-f179-c696fa3ddfec@nttcom.co.jp>
3785|13599|postgres|16384|f|16384|analyzing complete|0|0 <-- Added and
fixed. :)
---------------------------------------------------------
I have some comments.
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_BLOCKS_DONE
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_ANALYSIS,
+			0, 0
Do you oppose to leaving the total/done blocks alone here:p?
Thanks for your comments!
I created a new patch based on your comments because Alvaro allowed me
to work on revising the patch. :)

Ah, I revised it to remove "0, 0".
Thanks. For a second I thought that
PROGRESS_ANALYZE_PHASE_ANALYSIS was lost but it is living being
renamed to PROGRESS_ANALYZE_PHASE_ANALYSIS.

"PROGRESS_ANALYZE_PHASE_ANALYSIS" was replaced with
"PROGRESS_ANALYZE_PHASE_COMPUTING" because I changed
the phase-name on v3.patch like this:

./src/include/commands/progress.h

+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE              1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING                       2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE                        4

Is it Okay?

+  BlockNumber  nblocks;
+  double    blksdone = 0;
Why is it a double even though blksdone is of the same domain
with nblocks? And finally:
+    pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+                   ++blksdone);
It is converted into int64.
Fixed.
But is it suitable to use BlockNumber instead int64?
Yeah, I didn't meant that we should use int64 there. Sorry for
the confusing comment. I meant that blksdone should be of
BlockNumber.

Fixed. Thanks for the clarification. :)
Attached v4 patch file only includes this fix.

+                      WHEN 2 THEN 'analyzing sample'
+                      WHEN 3 THEN 'analyzing sample (extended stats)'
I think we should avoid parenthesized phrases as far as
not-necessary. That makes the column unnecessarily long. The
phase is internally called as "compute stats" so *I* would prefer
something like the followings:
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'analyzing complete'
And if you are intending by this that (as mentioned in the
documentation) "working to complete this analyze", this would be:
+                      WHEN 4 THEN 'completing analyze'
+                      WHEN 4 THEN 'finalizing analyze'
I have no strong opinion, so I changed the phase-names based on
your suggestions like following:

WHEN 2 THEN 'computing stats'
WHEN 3 THEN 'computing extended stats'
WHEN 4 THEN 'finalizing analyze'

However, I'd like to get any comments from hackers to get a consensus
about the names.
Agreed. Especially such word choosing is not suitable for me..

To Alvaro, Julien, Anthony and Robert,
Do you have any ideas? :)

+ <entry>Process ID of backend.</entry>
of "the" backend. ? And period is not attached on all
descriptions consisting of a single sentence.
+ <entry>OID of the database to which this backend is
connected.</entry>
+ <entry>Name of the database to which this backend is
connected.</entry>
"database to which .. is connected" is phrased as "database this
backend is connected to" in pg_stat_acttivity. (Just for consistency)
I checked the sentences on other views of progress monitor (VACUUM,
CREATE INDEX and CLUSTER), and they are same sentence. Therefore,
I'd like to keep it as is. :)
Oh, I see from where the wordings came. But no periods seen after
sentense when it is the only one in a description in other system
views tables. I think the progress views tables should be
corrected following convention.

Sounds reasonable. However, I'd like to create another patch after
this feature was committed since that document fix influence other
progress monitor's description on the document.

+ <entry>Whether the current scan includes legacy inheritance
children.</entry>
This apparently excludes partition tables but actually it is
included.

"Whether scanning through child tables" ?
I'm not sure "child tables" is established as the word to mean
both "partition tables and inheritance children"..

Hmm... I'm also not sure but I fixed as you suggested.

Yeah, I also am not sure the suggestion is good enough as is..

+ Total number of heap blocks to scan in the current table.
Number of heap blocks on scanning_table to scan?
It might be better that this description describes that this
and the next column is meaningful only while the phase
"scanning table".

How about:
Total number of heap blocks in the scanning_table.

(For me, ) that seems like it shows blocks including all
descendents for inheritance parent. But I'm not sure..a

In the case of scanning_table is parent table, it doesn't
show the number. However, child tables shows the number.
I tested it using Declarative partitioning table, See the bottom
of this email. :)

Please find attached patch. :)

Here is a test result of the patch.
==========================================================
# select * from pg_stat_progress_analyze ; \watch 0.0001;

9067|13599|postgres|16387|f|16387|scanning table|443|14
...
9067|13599|postgres|16387|f|16387|scanning table|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|computing stats|443|443
9067|13599|postgres|16387|f|16387|finalizing analyze|443|443
==========================================================

Looks fine!

I shared a test result using Declarative partitioning table.

==========================================================
## Create partitioning table
create table hoge as select a from generate_series(0, 40000) a;

create table hoge2(
a integer
) partition by range(a);

create table hoge2_10000 partition of hoge2
for values from (0) to (10000);

create table hoge2_20000 partition of hoge2
for values from (10000) to (20000);

create table hoge2_30000 partition of hoge2
for values from (20000) to (30000);

create table hoge2_default partition of hoge2 default;

## Test
select oid,relname,relpages,reltuples from pg_class where relname like 'hoge%';

oid | relname | relpages | reltuples
-------+---------------+----------+-----------
16538 | hoge | 177 | 40001
16541 | hoge2 | 0 | 0
16544 | hoge2_10000 | 45 | 10000
16547 | hoge2_20000 | 45 | 10000
16550 | hoge2_30000 | 45 | 10000
16553 | hoge2_default | 45 | 10001
(6 rows)

select * from pg_stat_progress_analyze ; \watch 0.00001;

27579|13599|postgres|16541|t|16544|scanning table|45|17
27579|13599|postgres|16541|t|16544|scanning table|45|38
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45
27579|13599|postgres|16541|t|16544|scanning table|45|45

27579|13599|postgres|16541|t|16547|scanning table|45|17
27579|13599|postgres|16541|t|16547|scanning table|45|37
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45
27579|13599|postgres|16541|t|16547|scanning table|45|45

27579|13599|postgres|16541|t|16550|scanning table|45|9
27579|13599|postgres|16541|t|16550|scanning table|45|30
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45
27579|13599|postgres|16541|t|16550|scanning table|45|45

27579|13599|postgres|16541|t|16553|scanning table|45|5
27579|13599|postgres|16541|t|16553|scanning table|45|26
27579|13599|postgres|16541|t|16553|scanning table|45|42
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|scanning table|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|computing stats|45|45
27579|13599|postgres|16541|t|16553|finalizing analyze|45|45

27579|13599|postgres|16544|f|16544|scanning table|45|1
27579|13599|postgres|16544|f|16544|scanning table|45|30
27579|13599|postgres|16544|f|16544|computing stats|45|45

27579|13599|postgres|16547|f|16547|scanning table|45|25
27579|13599|postgres|16547|f|16547|computing stats|45|45

27579|13599|postgres|16550|f|16550|scanning table|45|11
27579|13599|postgres|16550|f|16550|scanning table|45|38
27579|13599|postgres|16550|f|16550|finalizing analyze|45|45

27579|13599|postgres|16553|f|16553|scanning table|45|25
27579|13599|postgres|16553|f|16553|computing stats|45|45
==========================================================

I'll share test result using partitioning table via
Inheritance tables on next email. :)

Thanks,
Tatsuro Yamada

Attachments:

v4-0001-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v4-0001-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..bf09d69 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -361,6 +361,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </row>
 
      <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
+     <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
        <command>CLUSTER</command> or <command>VACUUM FULL</command>, showing current progress.
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3927,6 +3935,131 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>scanning_table</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>heap_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks in the scanning_table.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>scanning table</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>scanning_table</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..ce3416e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS scanning_table,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'scanning table'
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param4 AS heap_blks_total, S.param5 AS heap_blks_scanned
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 8d633f2..58d8026 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -494,6 +499,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_PHASE,
+			PROGRESS_ANALYZE_INH
+		};
+		const int64 val[] = {
+			PROGRESS_ANALYZE_PHASE_SCAN_TABLE,
+			inh
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -513,7 +530,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTING);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -575,10 +595,15 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		}
 
 		/* Build extended statistics (if there are any). */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED);
 		BuildRelationExtStatistics(onerel, totalrows, numrows, rows, attr_cnt,
 								   vacattrstats);
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1017,6 +1042,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
 
 	Assert(targrows > 0);
 
@@ -1026,7 +1053,20 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+	{
+		const int   index[] = {
+			PROGRESS_ANALYZE_TOTAL_BLOCKS,
+			PROGRESS_ANALYZE_SCANREL
+		};
+		const int64 val[] = {
+			nblocks,
+			RelationGetRelid(onerel)
+		};
+
+		pgstat_progress_update_multi_param(2, index, val);
+	}
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1087,6 +1127,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..db2cc5c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537..f7daece 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313..300bb58 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INH					1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE		1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING			2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE			4
+
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..9344654 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -956,6 +956,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507..76d31dc 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..691f46a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,24 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS scanning_table,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param4 AS heap_blks_total,
+    s.param5 AS heap_blks_scanned
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#19

Thomas Munro

thomas.munro@gmail.com

over 6 years ago

In reply to: Tatsuro Yamada (#18)

Re: progress report for ANALYZE

On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Attached v4 patch file only includes this fix.

Hello all,

I've moved this to the September CF, where it is in "Needs review" state.

Thanks,

--
Thomas Munro
https://enterprisedb.com

#20

Robert Haas

robertmhaas@gmail.com

over 6 years ago

In reply to: Thomas Munro (#19)

Re: progress report for ANALYZE

On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Attached v4 patch file only includes this fix.

I've moved this to the September CF, where it is in "Needs review" state.

/me reviews.

+ <entry><structfield>scanning_table</structfield></entry>

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

+ The command is computing extended stats from the samples
obtained in the previous phase.

I think you should change this (and the previous one) to say "from the
samples obtained during the table scan."

+ Total number of heap blocks in the scanning_table.

Perhaps I'm confused, but it looks to me like what you are advertising
is the number of blocks that will be sampled, not the total number of
blocks in the table. I think that's the right thing to advertise, but
then the column should be named and documented that way.

+ {
+ const int   index[] = {
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_SCANREL
+ };
+ const int64 val[] = {
+ nblocks,
+ RelationGetRelid(onerel)
+ };
+
+ pgstat_progress_update_multi_param(2, index, val);
+ }

This block seems to be introduced just so you can declare variables; I
don't think that's good style. It's arguably unnecessary because we
now are selectively allowing variable declarations within functions,
but I think you should just move the first array to the top of the
function and the second declaration to the top of the function
dropping const, and then just do val[0] = nblocks and val[1] =
RelationGetRelid(onerel). Maybe you can also come up with better
names than 'index' and 'val'. Same comment applies to another place
where you have something similar.

Patch seems to need minor rebasing.

Maybe "scanning table" should be renamed "acquiring sample rows," to
match the names used in the code?

I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count. That creates a race during which users could see the first bit
of data set and the second not set. I don't see any reason not to set
all four fields together.

Please be sure to make the names of the constants you use match up
with the external names as far as it reasonably makes sense, e.g.

+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE       1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING            2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE         4

vs.

+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text

Not terrible, but it could be closer.

Similarly with the column names (include_children vs. INH).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#21

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Robert Haas (#20)

1 attachment(s)

Re: progress report for ANALYZE

Hi Robert and All!

On 2019/08/02 2:48, Robert Haas wrote:> On Thu, Aug 1, 2019 at 4:45 AM Thomas Munro <thomas.munro@gmail.com> wrote:

On Tue, Jul 23, 2019 at 4:51 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Attached v4 patch file only includes this fix.

I've moved this to the September CF, where it is in "Needs review" state.

/me reviews.

Thanks for your comments! :)

+ <entry><structfield>scanning_table</structfield></entry>

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

Fixed.
I changed "scanning_table" to "current_relid" for analyze in monitoring.sgml.
However, I didn't change "relid" on other places for other commands because
I'd like to create other patch for that later. :)

+ The command is computing extended stats from the samples
obtained in the previous phase.

I think you should change this (and the previous one) to say "from the
samples obtained during the table scan."

Fixed.

+ Total number of heap blocks in the scanning_table.

Perhaps I'm confused, but it looks to me like what you are advertising
is the number of blocks that will be sampled, not the total number of
blocks in the table. I think that's the right thing to advertise, but
then the column should be named and documented that way.

Ah, you are right. Fixed.
I used the following sentence based on Vinayak's patch created two years a go.

- <entry><structfield>heap_blks_total</structfield></entry>
- <entry><type>bigint</type></entry>
- <entry>
- Total number of heap blocks in the current_relid.
- </entry>

+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+</entry>

+ {
+ const int   index[] = {
+ PROGRESS_ANALYZE_TOTAL_BLOCKS,
+ PROGRESS_ANALYZE_SCANREL
+ };
+ const int64 val[] = {
+ nblocks,
+ RelationGetRelid(onerel)
+ };
+
+ pgstat_progress_update_multi_param(2, index, val);
+ }
This block seems to be introduced just so you can declare variables; I
don't think that's good style. It's arguably unnecessary because we
now are selectively allowing variable declarations within functions,
but I think you should just move the first array to the top of the
function and the second declaration to the top of the function
dropping const, and then just do val[0] = nblocks and val[1] =
RelationGetRelid(onerel). Maybe you can also come up with better
names than 'index' and 'val'. Same comment applies to another place
where you have something similar.

I agreed and fixed.

Patch seems to need minor rebasing.

Maybe "scanning table" should be renamed "acquiring sample rows," to
match the names used in the code?

I fixed as following:

s/PROGRESS_ANALYZE_PHASE_SCAN_TABLE/
PROGRESS_ANALYZE_PHASE_ACQUIRING_SAMPLE_ROWS/

s/WHEN 1 THEN 'scanning table'::text/
WHEN 1 THEN 'acquiring sample rows'::text/

I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count. That creates a race during which users could see the first bit
of data set and the second not set. I don't see any reason not to set
all four fields together.

Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas? :)

# do_analyze_rel()
...
if (inh)
numrows = acquire_inherited_sample_rows(onerel, elevel,
rows, targrows,
&totalrows, &totaldeadrows);
else
numrows = (*acquirefunc) (onerel, elevel,
rows, targrows,
&totalrows, &totaldeadrows);

# acquire_inherited_sample_rows()
...
foreach(lc, tableOIDs)
{
...
/* Check table type (MATVIEW can't happen, but might as well allow) */
if (childrel->rd_rel->relkind == RELKIND_RELATION ||
childrel->rd_rel->relkind == RELKIND_MATVIEW)
{
/* Regular table, so use the regular row acquisition function */
acquirefunc = acquire_sample_rows;
...
/* OK, we'll process this child */
has_child = true;
rels[nrels] = childrel;
acquirefuncs[nrels] = acquirefunc;
...
}
...
for (i = 0; i < nrels; i++)
{
...
AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
...
if (childtargrows > 0)
{
...
/* Fetch a random sample of the child's rows */
childrows = (*acquirefunc) (childrel, elevel,
rows + numrows, childtargrows,
&trows, &tdrows)

Please be sure to make the names of the constants you use match up
with the external names as far as it reasonably makes sense, e.g.

+#define PROGRESS_ANALYZE_PHASE_SCAN_TABLE       1
+#define PROGRESS_ANALYZE_PHASE_COMPUTING            2
+#define PROGRESS_ANALYZE_PHASE_COMPUTING_EXTENDED 3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE         4

vs.

+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text

Not terrible, but it could be closer.

Agreed.
How about these?:

#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS 1 <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS 2 <- fixed
#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS 3 <- fixed
#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE 4 <- fixed

vs.

WHEN 1 THEN 'acquiring sample rows'::text
WHEN 2 THEN 'computing stats'::text
WHEN 3 THEN 'computing extended stats'::text
WHEN 4 THEN 'finalizing analyze'::text

I revised the name of the constants, so the constants and the phase
names are closer than before. Also, I used Verb instead Gerund
because other phase names used Verb such as VACUUM. :)

Similarly with the column names (include_children vs. INH).

Fixed.
I selected "include_children" as the column name because it's
easy to understand than "INH".

s/PROGRESS_ANALYZE_INH/
PROGRESS_ANALYZE_INCLUDE_CHILDREN/

Please find attached file. :)

Thanks,
Tatsuro Yamada

Attachments:

v5-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v5-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..4348b60 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -361,6 +361,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </row>
 
      <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
+     <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
        <command>CLUSTER</command> or <command>VACUUM FULL</command>, showing current progress.
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3927,6 +3935,131 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>current_relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>current_relid</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained during the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..cc0c629 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS current_relid,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'acquiring sample rows'
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param4 AS sample_blks_total, S.param5 AS heap_blks_scanned
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 92285e8..a3802bb 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -317,6 +322,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	const int   st_index[] = {
+		PROGRESS_ANALYZE_PHASE,
+		PROGRESS_ANALYZE_INCLUDE_CHILDREN
+	};
+	int64 st_val[2];
 
 	if (inh)
 		ereport(elevel,
@@ -494,6 +504,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	st_val[0] =	PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS;
+	st_val[1] =	inh;
+	pgstat_progress_update_multi_param(2, st_index, st_val);
+	
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -513,7 +527,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -581,10 +598,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * not for relations representing inheritance trees.
 		 */
 		if (!inh)
+		{
+			pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+										 PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS);
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 									   attr_cnt, vacattrstats);
+		}
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1023,6 +1047,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
+	const int   sb_index[] = {
+		PROGRESS_ANALYZE_TOTAL_BLOCKS,
+		PROGRESS_ANALYZE_SCANREL
+	};
+	int64 sb_val[2];
 
 	Assert(targrows > 0);
 
@@ -1032,7 +1063,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+
+	/* Report sampling block numbers */
+	sb_val[0] = nblocks;
+	sb_val[1] = RelationGetRelid(onerel);
+	pgstat_progress_update_multi_param(2, sb_index, sb_val);
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1093,6 +1130,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..db2cc5c 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537..f7daece 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313..d99e861 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INCLUDE_CHILDREN		1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS	1
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS		2
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS	3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE		4
+
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a..9344654 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -956,6 +956,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507..76d31dc 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..909931a 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,24 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS acquiring sample rows,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'scanning table'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param4 AS sample_blks_total,
+    s.param5 AS heap_blks_scanned
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#22

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Tatsuro Yamada (#4)

Re: progress report for ANALYZE

Hello,

On 2019-Jul-03, Tatsuro Yamada wrote:

My ex-colleague Vinayak created same patch in 2017 [1], and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

[1] ANALYZE command progress checker
/messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com

So just now I went to check the jold thread (which I should have
searched for before writing my own implementation!). It seems clear
that many things are pretty similar in both patch, but I think for the
most part they are similar just because the underlying infrastructure
imposes a certain design already, rather than there being any actual
copying. (To be perfectly clear: I didn't even know that this patch
existed and I didn't grab any code from there to produce my v1.)

However, you've now modified the patch from what I submitted and I'm
wondering if you've taken any inspiration from Vinayak's old patch. If
so, it seems fair to credit him as co-author in the commit message. It
would be good to get his input on the current patch, though.

I have not looked at the current version of the patch yet, but I intend
to do so during the upcoming commitfest.

Thanks for moving this forward!

On the subject of FDW support: I did look into supporting that before
submitting this. I think it's not academically difficult: just have the
FDW's acquire_sample_rows callback invoke the update_param functions
once in a while. Sadly, in practical terms it looks like postgres_fdw
is quite stupid about ANALYZE (it scans the whole table??) so doing
something that's actually useful may not be so easy. At least, we know
the total relation size and maybe we can add the ctid column to the
cursor in postgresAcquireSampleRowsFunc so that we have a current block
number to report (becing careful about synchronized seqscans). I think
this should *not* be part of the main ANALYZE-progress commit, though,
because getting that properly sorted out is going to take some more
time.

I do wonder why doesn't postgres_fdw use TABLESAMPLE.

I did not look at other FDWs at all, mind.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#23

Etsuro Fujita

etsuro.fujita@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#22)

Re: progress report for ANALYZE

Hi,

On Tue, Aug 13, 2019 at 11:01 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

On the subject of FDW support: I did look into supporting that before
submitting this. I think it's not academically difficult: just have the
FDW's acquire_sample_rows callback invoke the update_param functions
once in a while. Sadly, in practical terms it looks like postgres_fdw
is quite stupid about ANALYZE (it scans the whole table??) so doing
something that's actually useful may not be so easy. At least, we know
the total relation size and maybe we can add the ctid column to the
cursor in postgresAcquireSampleRowsFunc so that we have a current block
number to report (becing careful about synchronized seqscans).

I don't follow this thread fully, so I might miss something, but I
don't think that's fully applicable, because foreign tables managed by
postgres_fdw can be eg, views on the remote side.

I do wonder why doesn't postgres_fdw use TABLESAMPLE.

Yeah, that's really what I'm thinking for PG13; but I think we would
still need to scan the whole table in some cases (eg, when the foreign
table is a view on the remote side), because the TABLESAMLE clause can
only be applied to regular tables and materialized views.

I did not look at other FDWs at all, mind.

IIUC, oracle_fdw already uses the SAMPLE BLOCK clause for that. Right?

Best regards,
Etsuro Fujita

#24

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Etsuro Fujita (#23)

Re: progress report for ANALYZE

On 2019-Aug-14, Etsuro Fujita wrote:

On Tue, Aug 13, 2019 at 11:01 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

On the subject of FDW support: I did look into supporting that before
submitting this. I think it's not academically difficult: just have the
FDW's acquire_sample_rows callback invoke the update_param functions
once in a while. Sadly, in practical terms it looks like postgres_fdw
is quite stupid about ANALYZE (it scans the whole table??) so doing
something that's actually useful may not be so easy. At least, we know
the total relation size and maybe we can add the ctid column to the
cursor in postgresAcquireSampleRowsFunc so that we have a current block
number to report (becing careful about synchronized seqscans).

I don't follow this thread fully, so I might miss something, but I
don't think that's fully applicable, because foreign tables managed by
postgres_fdw can be eg, views on the remote side.

Oh, hmm, well I guess that covers the tables and matviews then ... I'm
not sure there's a good way to cover foreign tables that are views or
other stuff. Maybe that'll have to do. But at least it covers more
cases than none.

I do wonder why doesn't postgres_fdw use TABLESAMPLE.

Yeah, that's really what I'm thinking for PG13; but I think we would
still need to scan the whole table in some cases (eg, when the foreign
table is a view on the remote side), because the TABLESAMLE clause can
only be applied to regular tables and materialized views.

Sure.

I did not look at other FDWs at all, mind.

IIUC, oracle_fdw already uses the SAMPLE BLOCK clause for that. Right?

Yeah, it does that, I checked precisely oracle_fdw this morning.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#25

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Alvaro Herrera (#22)

Re: progress report for ANALYZE

Hi Alvaro,

On 2019/08/13 23:01, Alvaro Herrera wrote:

Hello,

On 2019-Jul-03, Tatsuro Yamada wrote:

My ex-colleague Vinayak created same patch in 2017 [1], and he
couldn't get commit because there are some reasons such as the
patch couldn't handle analyzing Foreign table. Therefore, I wonder
whether your patch is able to do that or not.

[1] ANALYZE command progress checker
/messages/by-id/968b4eda-2417-8b7b-d468-71643cf088b6@openscg.com

So just now I went to check the jold thread (which I should have
searched for before writing my own implementation!). It seems clear
that many things are pretty similar in both patch, but I think for the
most part they are similar just because the underlying infrastructure
imposes a certain design already, rather than there being any actual
copying. (To be perfectly clear: I didn't even know that this patch
existed and I didn't grab any code from there to produce my v1.)

I know your patch was not based on Vinayak's old patch because
coding style is different between him and you.

However, you've now modified the patch from what I submitted and I'm
wondering if you've taken any inspiration from Vinayak's old patch. If
so, it seems fair to credit him as co-author in the commit message. It
would be good to get his input on the current patch, though.

Yeah, I'm happy if you added his name as co-authors because I checked
the document including his old patch as a reference. :)

I have not looked at the current version of the patch yet, but I intend
to do so during the upcoming commitfest.

Thanks for moving this forward!

Thanks!
Committing the patch on PG13 makes me happy because Progress reporting
features are important for DBA. :)

On the subject of FDW support: I did look into supporting that before
submitting this. I think it's not academically difficult: just have the
FDW's acquire_sample_rows callback invoke the update_param functions
once in a while. Sadly, in practical terms it looks like postgres_fdw

Actually, I've changed my mind.
Even if there is no FDW support, Progress report for ANALYZE is still useful. Therefore, FDW support would be preferable but not required for committing
the patch, I believe. :)

Thanks,
Tatsuro Yamada

#26

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Tatsuro Yamada (#25)

1 attachment(s)

Re: progress report for ANALYZE

There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v6-Report-progress-for-ANALYZE.patchtext/x-diff; charset=us-asciiDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..630f9821a9 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3927,6 +3935,132 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>current_relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>current_relid</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained during the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..cc0c6291d9 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS current_relid,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'acquiring sample rows'
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param4 AS sample_blks_total, S.param5 AS heap_blks_scanned
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index d2fdf447b6..6380e4cbbb 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -317,6 +322,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	const int   st_index[] = {
+		PROGRESS_ANALYZE_PHASE,
+		PROGRESS_ANALYZE_INCLUDE_CHILDREN
+	};
+	int64 st_val[2];
 
 	if (inh)
 		ereport(elevel,
@@ -494,6 +504,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	st_val[0] =	PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS;
+	st_val[1] =	inh;
+	pgstat_progress_update_multi_param(2, st_index, st_val);
+
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -513,7 +527,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -581,10 +598,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * not for relations representing inheritance trees.
 		 */
 		if (!inh)
+		{
+			pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+										 PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS);
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 									   attr_cnt, vacattrstats);
+		}
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1023,6 +1047,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
+	const int   sb_index[] = {
+		PROGRESS_ANALYZE_TOTAL_BLOCKS,
+		PROGRESS_ANALYZE_SCANREL
+	};
+	int64 sb_val[2];
 
 	Assert(targrows > 0);
 
@@ -1032,7 +1063,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+
+	/* Report sampling block numbers */
+	sb_val[0] = nblocks;
+	sb_val[1] = RelationGetRelid(onerel);
+	pgstat_progress_update_multi_param(2, sb_index, sb_val);
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1093,6 +1130,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..db2cc5c316 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537979..f7daece5ee 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313cb3..d99e8613d2 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INCLUDE_CHILDREN		1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS	1
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS		2
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS	3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE		4
+
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..df418af4e7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -955,6 +955,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507fb5..76d31dc126 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..638fb6acca 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,24 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS current_relid,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'acquiring sample rows'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param4 AS sample_blks_total,
+    s.param5 AS heap_blks_scanned
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#27

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

over 6 years ago

In reply to: Alvaro Herrera (#26)

Re: progress report for ANALYZE

Hi Alvaro,

There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.

Thanks for fixing that. :)
I'll test it later.

I think we have to address the following comment from Robert. Right?
Do you have any ideas?

I'm not a fan of the way you set the scan-table phase and inh flag in
one place, and then slightly later set the relation OID and block
count. That creates a race during which users could see the first bit
of data set and the second not set. I don't see any reason not to set
all four fields together.

Hmm... I understand but it's little difficult because if there are
child rels, acquire_inherited_sample_rows() calls acquire_sample_rows()
(See below). So, it is able to set all four fields together if inh flag
is given as a parameter of those functions, I suppose. But I'm not sure
whether it's okay to add the parameter to both functions or not.
Do you have any ideas?

# do_analyze_rel()
...
if (inh)
numrows = acquire_inherited_sample_rows(onerel, elevel,
rows, targrows,
&totalrows, &totaldeadrows);
else
numrows = (*acquirefunc) (onerel, elevel,
rows, targrows,
&totalrows, &totaldeadrows);

# acquire_inherited_sample_rows()
...
foreach(lc, tableOIDs)
{
...
/* Check table type (MATVIEW can't happen, but might as well allow) */
if (childrel->rd_rel->relkind == RELKIND_RELATION ||
childrel->rd_rel->relkind == RELKIND_MATVIEW)
{
/* Regular table, so use the regular row acquisition function */
acquirefunc = acquire_sample_rows;
...
/* OK, we'll process this child */
has_child = true;
rels[nrels] = childrel;
acquirefuncs[nrels] = acquirefunc;
...
}
...
for (i = 0; i < nrels; i++)
{
...
AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
...
if (childtargrows > 0)
{
...
/* Fetch a random sample of the child's rows */
childrows = (*acquirefunc) (childrel, elevel,
rows + numrows, childtargrows,
&trows, &tdrows)

Thanks,
Tatsuro Yamada

#28

vignesh C

vignesh21@gmail.com

over 6 years ago

In reply to: Alvaro Herrera (#26)

Re: progress report for ANALYZE

On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.

+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.

Like in below, there is single space after ".":
<entry>Time when this process' current transaction was started, or null
if no transaction is active. If the current
query is the first of its transaction, this column is equal to the
<structfield>query_start</structfield> column.
</entry>

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

#29

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: vignesh C (#28)

Re: progress report for ANALYZE

Hi vignesh!

On 2019/09/17 20:51, vignesh C wrote:

On Thu, Sep 5, 2019 at 2:31 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

There were some minor problems in v5 -- bogus Docbook as well as
outdated rules.out, small "git diff --check" complaint about whitespace.
This v6 (on today's master) fixes those, no other changes.
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
In the above after "." there is an extra space, is this intentional. I
had noticed that in lot of places there is couple of spaces and
sometimes single space across this file.
Like in below, there is single space after ".":
<entry>Time when this process' current transaction was started, or null
if no transaction is active. If the current
query is the first of its transaction, this column is equal to the
<structfield>query_start</structfield> column.
</entry>

Sorry for the late reply.

Probably, it is intentional because there are many extra space
in other documents. See below:

# Results of grep
=============
$ grep '\. ' doc/src/sgml/monitoring.sgml | wc -l
114
$ grep '\. ' doc/src/sgml/information_schema.sgml | wc -l
184
$ grep '\. ' doc/src/sgml/func.sgml | wc -l
577
=============

Therefore, I'm going to leave it as is. :)

Thanks,
Tatsuro Yamada

#30

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Tatsuro Yamada (#29)

2 attachment(s)

Re: progress report for ANALYZE

Hi Alvaro, vignesh,

I rebased the patch on 2a4d96eb, and added new column
"ext_compute_count" in pg_stat_progress_analyze vie to
report a number of computing extended stats.
It is like a "index_vacuum_count" in vacuum progress reporter or
"index_rebuild_count" in cluster progress reporter. :)

Please find attached file: v7.

And the following is a test result:
==============
[Session1]
\! pgbench -i
create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;

[Session2]
# \a \t
# select * from pg_stat_progress_analyze ; \watch 0.0001

27064|13583|postgres|16405|initializing|f|0|0|0|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|0|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|23|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|64|0
27064|13583|postgres|16405|acquiring sample rows|f|16405|1640|1640|0
27064|13583|postgres|16405|computing stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|0
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|1
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|2
27064|13583|postgres|16405|computing extended stats|f|16405|1640|1640|3
27064|13583|postgres|16405|finalizing analyze|f|16405|1640|1640|3

Note:
The result on Session2 was shortened for readability.
If you'd like to check the whole result, you can see attached file: "hoge.txt".
==============

Thanks,
Tatsuro Yamada

Attachments:

v7-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v7-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e6c49eebad..c84d9a0775 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3927,6 +3935,140 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>current_relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>ext_compute_count</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of computed extended stats.  This counter only advances when the phase
+       is <literal>computing extended stats</literal>.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>current_relid</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained during the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9fe4a4794a..f28bd6fb1e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,23 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'acquiring sample rows'
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'finalizing analyze'
+                      END AS phase,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS current_relid,
+        S.param4 AS sample_blks_total, S.param5 AS heap_blks_scanned,
+        S.param6 AS ext_compute_count
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7accb950eb..f0992cc605 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -318,6 +323,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	const int   st_index[] = {
+		PROGRESS_ANALYZE_PHASE,
+		PROGRESS_ANALYZE_INCLUDE_CHILDREN
+	};
+	int64 st_val[2];
 
 	if (inh)
 		ereport(elevel,
@@ -505,6 +515,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	st_val[0] =	PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS;
+	st_val[1] =	inh;
+	pgstat_progress_update_multi_param(2, st_index, st_val);
+
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -524,7 +538,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -592,10 +609,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * not for relations representing inheritance trees.
 		 */
 		if (!inh)
+		{
+			pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+										 PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS);
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 									   attr_cnt, vacattrstats);
+		}
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1034,6 +1058,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
+	const int   sb_index[] = {
+		PROGRESS_ANALYZE_TOTAL_BLOCKS,
+		PROGRESS_ANALYZE_SCANREL
+	};
+	int64 sb_val[2];
 
 	Assert(targrows > 0);
 
@@ -1043,7 +1074,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+
+	/* Report sampling block numbers */
+	sb_val[0] = nblocks;
+	sb_val[1] = RelationGetRelid(onerel);
+	pgstat_progress_update_multi_param(2, sb_index, sb_val);
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1104,6 +1141,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 207ee3160e..352920b147 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -24,10 +24,12 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
+#include "commands/progress.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
@@ -92,6 +94,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	List	   *stats;
 	MemoryContext cxt;
 	MemoryContext oldcxt;
+	int64		ext_cnt;
 
 	cxt = AllocSetContextCreate(CurrentMemoryContext,
 								"BuildRelationExtStatistics",
@@ -100,6 +103,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
 	stats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
+	ext_cnt = 0;
 
 	foreach(lc, stats)
 	{
@@ -161,6 +165,10 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 			else if (t == STATS_EXT_MCV)
 				mcv = statext_mcv_build(numrows, rows, stat->columns, stats,
 										totalrows, stattarget);
+
+			/* for reporting progress */
+			pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_COMPUTE_COUNT,
+										 ++ext_cnt);
 		}
 
 		/* store the statistics in the catalog */
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..db2cc5c316 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537979..f7daece5ee 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313cb3..00468cc474 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INCLUDE_CHILDREN		1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+#define PROGRESS_ANALYZE_EXT_COMPUTE_COUNT		5
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS	1
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS		2
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS	3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE		4
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..df418af4e7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -955,6 +955,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507fb5..76d31dc126 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..bd2f90a916 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,25 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'acquiring sample rows'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS current_relid,
+    s.param4 AS sample_blks_total,
+    s.param5 AS heap_blks_scanned,
+	S.param6 AS ext_compute_count
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

hoge.txttext/plain; charset=UTF-8; name=hoge.txtDownload

#31

Alvaro Herrera

alvherre@2ndquadrant.com

about 6 years ago

In reply to: Tatsuro Yamada (#30)

Re: progress report for ANALYZE

On 2019-Nov-05, Tatsuro Yamada wrote:

==============
[Session1]
\! pgbench -i
create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;

Wow, it takes a long time to compute these ...

Hmm, you normally wouldn't define stats that way; you'd do this instead:

create statistics pg_ext1 (dependencies, mcv,ndistinct) ON aid, bid from pgbench_accounts;

I'm not sure if this has an important impact in practice. What I'm
saying is that I'm not sure that "number of ext stats" is necessarily a
useful number as shown. I wonder if it's possible to count the number
of items that have been computed for each stats object. So if you do
this

create statistics pg_ext1 (dependencies, mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (ndistinct,histogram) ON aid, bid from pgbench_accounts;

then the counter goes to 4. But I also wonder if we need to publish
_which_ type of ext stats is currently being built, in a separate
column.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#32

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Alvaro Herrera (#31)

1 attachment(s)

Re: progress report for ANALYZE

Hi Alvaro!

On 2019/11/05 22:38, Alvaro Herrera wrote:

On 2019-Nov-05, Tatsuro Yamada wrote:

==============
[Session1]
\! pgbench -i
create statistics pg_ext1 (dependencies) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext3 (ndistinct) ON aid, bid from pgbench_accounts;

Wow, it takes a long time to compute these ...

Hmm, you normally wouldn't define stats that way; you'd do this instead:

create statistics pg_ext1 (dependencies, mcv,ndistinct) ON aid, bid from pgbench_accounts;

I'd like to say it's a just example of test case. But I understand that
your advice. Thanks! :)

I'm not sure if this has an important impact in practice. What I'm
saying is that I'm not sure that "number of ext stats" is necessarily a
useful number as shown. I wonder if it's possible to count the number
of items that have been computed for each stats object. So if you do
this

create statistics pg_ext1 (dependencies, mcv) ON aid, bid from pgbench_accounts;
create statistics pg_ext2 (ndistinct,histogram) ON aid, bid from pgbench_accounts;

then the counter goes to 4. But I also wonder if we need to publish
_which_ type of ext stats is currently being built, in a separate
column.

Hmm... I have never seen a lot of extended stats on a table (with many columns)
but I suppose it will be existence near future because extended stats is an only
solution to correct row estimation error in vanilla PostgreSQL. Therefore, it
would be better to add the counter on the view, I think.

I revised the patch as following because I realized counting the types of ext
stats is not useful for users.

- Attached new patch counts a number of ext stats instead the types of ext stats.

So we can see the counter goes to "2", if we created above ext stats (pg_ext1 and
pg_ext2) and analyzed as you wrote. :)

Thanks,
Tatsuro Yamada

Attachments:

v8-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v8-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e6c49eebad..c84d9a0775 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
@@ -3481,7 +3489,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3927,6 +3935,140 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>current_relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>ext_compute_count</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of computed extended stats.  This counter only advances when the phase
+       is <literal>computing extended stats</literal>.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>current_relid</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained during the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9fe4a4794a..f28bd6fb1e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -964,6 +964,23 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'acquiring sample rows'
+                      WHEN 2 THEN 'computing stats'
+                      WHEN 3 THEN 'computing extended stats'
+                      WHEN 4 THEN 'finalizing analyze'
+                      END AS phase,
+        CAST(CAST(S.param2 AS int) AS boolean) AS include_children,
+        CAST(S.param3 AS oid) AS current_relid,
+        S.param4 AS sample_blks_total, S.param5 AS heap_blks_scanned,
+        S.param6 AS ext_compute_count
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 7accb950eb..f0992cc605 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -318,6 +323,11 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	const int   st_index[] = {
+		PROGRESS_ANALYZE_PHASE,
+		PROGRESS_ANALYZE_INCLUDE_CHILDREN
+	};
+	int64 st_val[2];
 
 	if (inh)
 		ereport(elevel,
@@ -505,6 +515,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	/*
 	 * Acquire the sample rows
 	 */
+	st_val[0] =	PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS;
+	st_val[1] =	inh;
+	pgstat_progress_update_multi_param(2, st_index, st_val);
+
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
@@ -524,7 +538,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -592,10 +609,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * not for relations representing inheritance trees.
 		 */
 		if (!inh)
+		{
+			pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+										 PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS);
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 									   attr_cnt, vacattrstats);
+		}
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1034,6 +1058,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
+	const int   sb_index[] = {
+		PROGRESS_ANALYZE_TOTAL_BLOCKS,
+		PROGRESS_ANALYZE_SCANREL
+	};
+	int64 sb_val[2];
 
 	Assert(targrows > 0);
 
@@ -1043,7 +1074,13 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+
+	/* Report sampling block numbers */
+	sb_val[0] = nblocks;
+	sb_val[1] = RelationGetRelid(onerel);
+	pgstat_progress_update_multi_param(2, sb_index, sb_val);
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1104,6 +1141,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 207ee3160e..2581a849f9 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -24,10 +24,12 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
+#include "commands/progress.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
@@ -92,6 +94,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	List	   *stats;
 	MemoryContext cxt;
 	MemoryContext oldcxt;
+	int64		ext_cnt;
 
 	cxt = AllocSetContextCreate(CurrentMemoryContext,
 								"BuildRelationExtStatistics",
@@ -100,6 +103,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
 	stats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
+	ext_cnt = 0;
 
 	foreach(lc, stats)
 	{
@@ -165,6 +169,10 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 		/* store the statistics in the catalog */
 		statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
+
+		/* for reporting progress */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_COMPUTE_COUNT,
+									 ++ext_cnt);
 	}
 
 	table_close(pg_stext, RowExclusiveLock);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..db2cc5c316 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537979..f7daece5ee 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313cb3..00468cc474 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,20 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_INCLUDE_CHILDREN		1
+#define PROGRESS_ANALYZE_SCANREL				2
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			3
+#define PROGRESS_ANALYZE_BLOCKS_DONE			4
+#define PROGRESS_ANALYZE_EXT_COMPUTE_COUNT		5
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS	1
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS		2
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS	3
+#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE		4
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..df418af4e7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -955,6 +955,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507fb5..76d31dc126 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..bd2f90a916 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1846,6 +1846,25 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_princ AS principal,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'acquiring sample rows'::text
+            WHEN 2 THEN 'computing stats'::text
+            WHEN 3 THEN 'computing extended stats'::text
+            WHEN 4 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    ((s.param2)::integer)::boolean AS include_children,
+    (s.param3)::oid AS current_relid,
+    s.param4 AS sample_blks_total,
+    s.param5 AS heap_blks_scanned,
+	S.param6 AS ext_compute_count
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#33

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tatsuro Yamada (#32)

Re: progress report for ANALYZE

Yamada-san,

Thanks for working on this.

On Wed, Nov 6, 2019 at 2:50 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

I revised the patch as following because I realized counting the types of ext
stats is not useful for users.

- Attached new patch counts a number of ext stats instead the types of ext stats.

So we can see the counter goes to "2", if we created above ext stats (pg_ext1 and
pg_ext2) and analyzed as you wrote. :)

I have looked at the patch and here are some comments.

I think include_children and current_relid are not enough to
understand the progress of analyzing inheritance trees, because even
with current_relid being updated, I can't tell how many more there
will be. I think it'd be better to show the total number of children
and the number of children processed, just like
pg_stat_progress_create_index does for partitions. So, instead of
include_children and current_relid, I think it's better to have
child_tables_total, child_tables_done, and current_child_relid, placed
last in the set of columns.

Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows". In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not. For partitioned tables, the progress
output will immediately switch to this phase, because partitioned
table itself is empty so there's nothing to do in the "acquiring
sample rows" phase.

That's all for now.

Thanks,
Amit

#34

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Amit Langote (#33)

Re: progress report for ANALYZE

Hi Amit-san!

Thanks for your comments!

I have looked at the patch and here are some comments.

I think include_children and current_relid are not enough to
understand the progress of analyzing inheritance trees, because even
with current_relid being updated, I can't tell how many more there
will be. I think it'd be better to show the total number of children
and the number of children processed, just like
pg_stat_progress_create_index does for partitions. So, instead of
include_children and current_relid, I think it's better to have
child_tables_total, child_tables_done, and current_child_relid, placed
last in the set of columns.

Ah, I understood.
I'll check pg_stat_progress_create_index does for partitions,
and will create a new patch. :)

Related to the above,
I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.
Would it be added to the total number column to these views? :)

Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows". In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not. For partitioned tables, the progress
output will immediately switch to this phase, because partitioned
table itself is empty so there's nothing to do in the "acquiring
sample rows" phase.

That's all for now.

Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch. :)

Thanks,
Tatsuro Yamada

#35

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Tatsuro Yamada (#34)

Re: progress report for ANALYZE

Hi Amit-san,

Related to the above,
I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.
Would it be added to the total number column to these views? :)

Oops, I made a mistake. :(

What I'd like to say was:
Would it be better to add the total number column to these views? :)

Thanks,
Tatsuro Yamada

#36

Alvaro Herrera

alvherre@2ndquadrant.com

about 6 years ago

In reply to: Tatsuro Yamada (#35)

Re: progress report for ANALYZE

On 2019-Nov-26, Tatsuro Yamada wrote:

I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.

Would it be better to add the total number column to these views? :)

Yeah, I think it would be good to add that.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#37

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Alvaro Herrera (#36)

Re: progress report for ANALYZE

Hi Alvaro!

On 2019/11/26 21:22, Alvaro Herrera wrote:

On 2019-Nov-26, Tatsuro Yamada wrote:

I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.

Would it be better to add the total number column to these views? :)

Yeah, I think it would be good to add that.

Thanks for your comment!
Okay, I'll add the column "ext_stats_total" to
pg_stat_progress_analyze view on the next patch. :)

Regarding to other total number columns,
I'll create another patch to add these columns "index_vacuum_total" and
"index_rebuild_count" on the other views. :)

Thanks,
Tatsuro Yamada

#38

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Alvaro Herrera (#36)

Re: progress report for ANALYZE

On Tue, Nov 26, 2019 at 9:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Nov-26, Tatsuro Yamada wrote:

I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.

Would it be better to add the total number column to these views? :)

Yeah, I think it would be good to add that.

Hmm, does it take that long to calculate ext stats on one column? The
reason it's worthwhile to have index_vacuum_count,
index_rebuild_count, etc. is because it can take very long for one
index to get vacuumed/rebuilt.

Thanks,
Amit

#39

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tatsuro Yamada (#37)

Re: progress report for ANALYZE

Yamada-san,

On Wed, Nov 27, 2019 at 11:04 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Regarding to other total number columns,
I'll create another patch to add these columns "index_vacuum_total" and
"index_rebuild_count" on the other views. :)

Maybe you meant "index_rebuild_total"?

Thanks,
Amit

#40

Alvaro Herrera

alvherre@2ndquadrant.com

about 6 years ago

In reply to: Amit Langote (#38)

Re: progress report for ANALYZE

On 2019-Nov-27, Amit Langote wrote:

On Tue, Nov 26, 2019 at 9:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2019-Nov-26, Tatsuro Yamada wrote:

I wonder whether we need the total number of ext stats on
pg_stat_progress_analyze or not. As you might know, there is the same
counter on pg_stat_progress_vacuum and pg_stat_progress_cluster.
For example, index_vacuum_count and index_rebuild_count.

Would it be better to add the total number column to these views? :)

Yeah, I think it would be good to add that.

Hmm, does it take that long to calculate ext stats on one column? The
reason it's worthwhile to have index_vacuum_count,
index_rebuild_count, etc. is because it can take very long for one
index to get vacuumed/rebuilt.

Yes, it's noticeable. It's not as long as building an index, of course,
but it's a long enough fraction of the total analyze time that it should
be reported.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#41

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Amit Langote (#39)

Re: progress report for ANALYZE

Hi Amit-san,

On Wed, Nov 27, 2019 at 11:04 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Regarding to other total number columns,
I'll create another patch to add these columns "index_vacuum_total" and
"index_rebuild_count" on the other views. :)

Maybe you meant "index_rebuild_total"?

Yeah, you are right! :)

Thanks,
Tatsuro Yamada

#42

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Tatsuro Yamada (#35)

2 attachment(s)

Re: progress report for ANALYZE

Hi Amit-san!

I think include_children and current_relid are not enough to
understand the progress of analyzing inheritance trees, because even
with current_relid being updated, I can't tell how many more there
will be. I think it'd be better to show the total number of children
and the number of children processed, just like
pg_stat_progress_create_index does for partitions. So, instead of
include_children and current_relid, I think it's better to have
child_tables_total, child_tables_done, and current_child_relid, placed
last in the set of columns.

Ah, I understood.
I'll check pg_stat_progress_create_index does for partitions,
and will create a new patch.

Fixed.

But I just remembered I replaced column name "*_table" with "*_relid"
based on Robert's comment three months ago, see below:

/me reviews.

+ <entry><structfield>scanning_table</structfield></entry>

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

So, it would be better to use same rule against child_tables_total and
child_tables_done. Thus I changed these column names to others and added
to the view. I also removed include_children and current_relid.
The following columns are new version.

<New columns of the view>
pid
datid
datname
relid
phase
sample_blks_total
heap_blks_scanned
ext_stats_total <= Added (based on Alvaro's comment)
ext_stats_computed <= Renamed
child_relids_total <= Added
child_relids_done <= Added
current_child_relid <= Added

Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows". In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not. For partitioned tables, the progress
output will immediately switch to this phase, because partitioned
table itself is empty so there's nothing to do in the "acquiring
sample rows" phase.

That's all for now.

Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch.

Fixed.

I tried to abbreviate it to "acquiring inh sample rows" because I thought
"acquiring inherited sample rows" is a little long for the phase name.

Attached WIP patch is including these fixes:
- Remove columns: include_children and current_relid
- Add new columns: child_relieds_total, child_relids_done and current_child_relid
- Add new phase "acquiring inh sample rows"

Note: the document is not updated, I'll fix it later. :)

Attached testcase.sql is for creating base table and partitioning table.
You can check the patch by using the following procedures, easily.

Terminal #1
--------------
\a \t
select * from pg_stat_progress_analyze; \watch 0.0001
--------------

Terminal #2
--------------
\i testcase.sql
analyze hoge;
analyze hoge2;
--------------

Thanks,
Tatsuro Yamada

Attachments:

v9-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v9-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a3c5f86b7e..462d3a5415 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
@@ -3502,7 +3510,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3948,6 +3956,140 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+     <row>
+      <entry><structfield>datid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>datname</structfield></entry>
+      <entry><type>name</type></entry>
+      <entry>Name of the database to which this backend is connected.</entry>
+     </row>
+     <row>
+      <entry><structfield>relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table being analyzed.</entry>
+     </row>
+     <row>
+      <entry><structfield>include_children</structfield></entry>
+      <entry><type>boolean</type></entry>
+      <entry>Whether scanning through child tables.</entry>
+     </row>
+     <row>
+      <entry><structfield>current_relid</structfield></entry>
+      <entry><type>oid</type></entry>
+      <entry>OID of the table currently being scanned.
+        It might be different from relid when analyzing tables that have child tables.
+      </entry>
+     </row>
+     <row>
+      <entry><structfield>phase</structfield></entry>
+      <entry><type>text</type></entry>
+      <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+     </row>
+     <row>
+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+     </entry>
+     </row>
+    <row>
+     <entry><structfield>heap_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>ext_compute_count</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of computed extended stats.  This counter only advances when the phase
+       is <literal>computing extended stats</literal>.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>current_relid</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained during the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f7800f01a6..95dc0c35a0 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -969,6 +969,27 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'acquiring sample rows'
+                      WHEN 2 THEN 'acquiring inh sample rows'
+                      WHEN 3 THEN 'computing stats'
+                      WHEN 4 THEN 'computing extended stats'
+                      WHEN 5 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param2 AS sample_blks_total,
+		S.param3 AS heap_blks_scanned,
+        S.param4 AS ext_stats_total,
+        S.param5 AS ext_stats_computed,
+        S.param6 AS child_relids_total,
+        S.param7 AS child_relids_done,
+        CAST(S.param8 AS oid) AS current_child_relid
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 71372ceb16..e454c62497 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -507,13 +512,23 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 */
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
+	{
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ACQUIRE_INH_SAMPLE_ROWS);
+
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
 												rows, targrows,
 												&totalrows, &totaldeadrows);
+	}
 	else
+	{
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS);
+
 		numrows = (*acquirefunc) (onerel, elevel,
 								  rows, targrows,
 								  &totalrows, &totaldeadrows);
+	}
 
 	/*
 	 * Compute the statistics.  Temporary results during the calculations for
@@ -524,7 +539,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -592,10 +610,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * not for relations representing inheritance trees.
 		 */
 		if (!inh)
+		{
+			pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+										 PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS);
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 									   attr_cnt, vacattrstats);
+		}
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1034,6 +1059,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
 
 	Assert(targrows > 0);
 
@@ -1043,7 +1070,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+
+	/* Report sampling block numbers */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_BLOCKS,
+								 nblocks);
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1104,6 +1136,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -1332,6 +1367,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	 * rels have radically different free-space percentages, but it's not
 	 * clear that it's worth working harder.)
 	 */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_TOTAL_CHILD_RELIDS,
+								 nrels);
 	numrows = 0;
 	*totalrows = 0;
 	*totaldeadrows = 0;
@@ -1341,6 +1378,9 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
 		double		childblocks = relblocks[i];
 
+		pgstat_progress_update_param(PROGRESS_ANALYZE_CURRENT_CHILD_RELID,
+									 RelationGetRelid(childrel));
+
 		if (childblocks > 0)
 		{
 			int			childtargrows;
@@ -1396,6 +1436,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		 * pointers to their TOAST tables in the sampled rows.
 		 */
 		table_close(childrel, NoLock);
+		pgstat_progress_update_param(PROGRESS_ANALYZE_CHILD_RELIDS_DONE,
+									 i + 1);
 	}
 
 	return numrows;
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 207ee3160e..9d868f2835 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -24,10 +24,12 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
+#include "commands/progress.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
@@ -92,6 +94,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	List	   *stats;
 	MemoryContext cxt;
 	MemoryContext oldcxt;
+	int64		ext_cnt;
 
 	cxt = AllocSetContextCreate(CurrentMemoryContext,
 								"BuildRelationExtStatistics",
@@ -100,6 +103,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
 	stats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
+	ext_cnt = 0;
 
 	foreach(lc, stats)
 	{
@@ -165,6 +169,10 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 		/* store the statistics in the catalog */
 		statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
+
+		/* for reporting progress */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_STATS_COMPUTED,
+									 ++ext_cnt);
 	}
 
 	table_close(pg_stext, RowExclusiveLock);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..db2cc5c316 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537979..f7daece5ee 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313cb3..5da355694a 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,23 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE					0
+#define PROGRESS_ANALYZE_TOTAL_BLOCKS			1
+#define PROGRESS_ANALYZE_BLOCKS_DONE			2
+#define PROGRESS_ANALYZE_TOTAL_EXT_STATS		3
+#define PROGRESS_ANALYZE_EXT_STATS_COMPUTED		4
+#define PROGRESS_ANALYZE_TOTAL_CHILD_RELIDS		5
+#define PROGRESS_ANALYZE_CHILD_RELIDS_DONE		6
+#define PROGRESS_ANALYZE_CURRENT_CHILD_RELID	7
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS		1
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_INH_SAMPLE_ROWS	2
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS			3
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS		4
+#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE			5
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..df418af4e7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -955,6 +955,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507fb5..76d31dc126 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c9cc569404..00486ef64b 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1847,6 +1847,28 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
   WHERE (s.client_port IS NOT NULL);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'acquiring sample rows'::text
+            WHEN 2 THEN 'acquiring inh sample rows'::text
+            WHEN 3 THEN 'computing stats'::text
+            WHEN 4 THEN 'computing extended stats'::text
+            WHEN 5 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param2 AS sample_blks_total,
+    s.param3 AS heap_blks_scanned,
+    s.param4 AS ext_stats_total,
+    s.param5 AS ext_stats_computed,
+    s.param6 AS child_relids_total,
+    s.param7 AS child_relids_done,
+    (s.param8)::oid AS current_child_relid
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

testcase.sqltext/plain; charset=UTF-8; name=testcase.sqlDownload

#43

Michael Paquier

michael@paquier.xyz

about 6 years ago

In reply to: Tatsuro Yamada (#42)

Re: progress report for ANALYZE

On Wed, Nov 27, 2019 at 12:45:41PM +0900, Tatsuro Yamada wrote:

Fixed.

Patch was waiting on input from author, so I have switched it back to
"Needs review", and moved it to next CF while on it as you are working
on it.
--
Michael

#44

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tatsuro Yamada (#42)

Re: progress report for ANALYZE

Yamada-san,

Thank you for updating the patch.

On Wed, Nov 27, 2019 at 12:46 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

But I just remembered I replaced column name "*_table" with "*_relid"
based on Robert's comment three months ago, see below:

/me reviews.

+ <entry><structfield>scanning_table</structfield></entry>

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

So, it would be better to use same rule against child_tables_total and
child_tables_done. Thus I changed these column names to others and added
to the view.

Robert's comment seems OK for a column that actually reports an OID,
but for columns that report counts, names containing "relids" sound a
bit strange to me. So, I prefer child_tables_total /
child_tables_done over child_relids_total / child_relids_done. Would
like to hear more opinions.

Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows". In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not.

Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch.

Fixed.

I tried to abbreviate it to "acquiring inh sample rows" because I thought
"acquiring inherited sample rows" is a little long for the phase name.

I think phase names should contain full English words, because they
are supposed to be descriptive. Users are very likely to not
understand what "inh" means without looking up the docs.

Thanks,
Amit

#45

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Amit Langote (#44)

Re: progress report for ANALYZE

Hi Amit-san,

On 2019/11/28 10:59, Amit Langote wrote:

Yamada-san,

Thank you for updating the patch.

On Wed, Nov 27, 2019 at 12:46 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

But I just remembered I replaced column name "*_table" with "*_relid"
based on Robert's comment three months ago, see below:

/me reviews.

+ <entry><structfield>scanning_table</structfield></entry>

I think this should be retitled to something that ends in 'relid',
like all of the corresponding cases in existing progress views.
Perhaps 'active_relid' or 'current_relid'.

So, it would be better to use same rule against child_tables_total and
child_tables_done. Thus I changed these column names to others and added
to the view.

Robert's comment seems OK for a column that actually reports an OID,
but for columns that report counts, names containing "relids" sound a
bit strange to me. So, I prefer child_tables_total /
child_tables_done over child_relids_total / child_relids_done. Would
like to hear more opinions.

Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:

<Columns of the view>
pid
datid
datname
relid
phase
sample_blks_total
heap_blks_scanned
ext_stats_total
ext_stats_computed
child_tables_total <= Renamed: s/relid/table/
child_tables_done <= Renamed: s/relid/table/
current_child_table <= Renamed: s/relid/table/

Also, inheritance tree stats are created *after* creating single table
stats, so I think that it would be better to have a distinct phase
name for that, say "acquiring inherited sample rows". In
do_analyze_rel(), you can select which of two phases to set based on
whether inh is true or not.

Okay! I'll also add the new phase "acquiring inherited sample rows" on
the next patch.

Fixed.

I tried to abbreviate it to "acquiring inh sample rows" because I thought
"acquiring inherited sample rows" is a little long for the phase name.

I think phase names should contain full English words, because they
are supposed to be descriptive. Users are very likely to not
understand what "inh" means without looking up the docs.

Okay, I will fix it.
s/acquiring inh sample rows/acquiring inherited sample rows/

Thanks,
Tatsuro Yamada

#46

Alvaro Herrera

alvherre@2ndquadrant.com

about 6 years ago

In reply to: Tatsuro Yamada (#45)

Re: progress report for ANALYZE

On 2019-Nov-28, Tatsuro Yamada wrote:

Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:

Here's my take on it:

<Columns of the view>
pid
datid
datname
relid
phase
sample_blks_total
sample_blks_scanned
ext_stats_total
ext_stats_computed
child_tables_total
child_tables_done
current_child_table_relid

It seems to make sense to keep using the "child table" terminology in
that last column; but since the column carries an OID then as Robert
said it should have "relid" in the name. For the other two "child
tables" columns, not using "relid" is appropriate because what they have
is not relids.

I think there should be an obvious correspondence in columns that are
closely related, which there isn't if you use "sample" in one and "heap"
in the other, so I'd go for "sample" in both.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#47

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Alvaro Herrera (#46)

Re: progress report for ANALYZE

Hi Alvaro!

Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:

Here's my take on it:

<Columns of the view>
pid
datid
datname
relid
phase
sample_blks_total
sample_blks_scanned
ext_stats_total
ext_stats_computed
child_tables_total
child_tables_done
current_child_table_relid

It seems to make sense to keep using the "child table" terminology in
that last column; but since the column carries an OID then as Robert
said it should have "relid" in the name. For the other two "child
tables" columns, not using "relid" is appropriate because what they have
is not relids.

I think there should be an obvious correspondence in columns that are
closely related, which there isn't if you use "sample" in one and "heap"
in the other, so I'd go for "sample" in both.

Thanks for the comment.
Oops, You are right, I overlooked they are not relids..
I agreed with you and Amit's opinion so I'll send a revised patch on the next mail. :-)

Next patch will be included:
- New columns definition of the view (as above)
- Renamed the phase name: s/acquiring inh sample rows/acquiring inherited sample rows/
- Update document

Thanks,
Tatsuro Yamada

#48

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Michael Paquier (#43)

Re: progress report for ANALYZE

Hi Michael,

On 2019/11/27 13:25, Michael Paquier wrote:

On Wed, Nov 27, 2019 at 12:45:41PM +0900, Tatsuro Yamada wrote:

Fixed.

Patch was waiting on input from author, so I have switched it back to
"Needs review", and moved it to next CF while on it as you are working
on it.

Thanks for your CF manager work.
I will do my best to be committed at the next CF because
Progress reporting feature is useful for DBAs, as you know. :)

Thanks,
Tatsuro Yamada

#49

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Tatsuro Yamada (#47)

1 attachment(s)

Re: progress report for ANALYZE

Hi Alvaro and Amit!

On 2019/11/29 9:54, Tatsuro Yamada wrote:

Hi Alvaro!

Hmmm... I understand your opinion but I'd like to get more opinions too. :)
Do you prefer these column names? See below:

Here's my take on it:

<Columns of the view>
   pid
   datid
   datname
   relid
   phase
   sample_blks_total
   sample_blks_scanned
   ext_stats_total
   ext_stats_computed
   child_tables_total
   child_tables_done
   current_child_table_relid

It seems to make sense to keep using the "child table" terminology in
that last column; but since the column carries an OID then as Robert
said it should have "relid" in the name. For the other two "child
tables" columns, not using "relid" is appropriate because what they have
is not relids.

I think there should be an obvious correspondence in columns that are
closely related, which there isn't if you use "sample" in one and "heap"
in the other, so I'd go for "sample" in both.

Thanks for the comment.
Oops, You are right, I overlooked they are not relids..
I agreed with you and Amit's opinion so I'll send a revised patch on the next mail. :-)

Next patch will be included:
- New columns definition of the view (as above)
- Renamed the phase name: s/acquiring inh sample rows/acquiring inherited sample rows/
- Update document

Attached patch is the revised patch. :)

I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See <Test result> bellow.

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?

<Document>
---------------------------------------------------------
<entry><literal>finalizing analyze</literal></entry>
<entry>
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end.
---------------------------------------------------------

<Test result using partitioning tables>
---------------------------------------------------------
# select * from pg_stat_progress_analyze ; \watch 0.0001

19309|13583|postgres|36081|acquiring inherited sample rows|0|0|0|0|0|0|0
19309|13583|postgres|36081|acquiring inherited sample rows|45|17|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|35|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|0|36084
19309|13583|postgres|36081|acquiring inherited sample rows|45|3|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|22|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|38|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|1|36087
19309|13583|postgres|36081|acquiring inherited sample rows|45|16|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|34|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|2|36090
19309|13583|postgres|36081|acquiring inherited sample rows|45|10|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|29|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|43|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|acquiring inherited sample rows|45|45|0|0|4|3|36093
19309|13583|postgres|36081|computing stats|45|45|0|0|4|4|36093 <== current_*_reid should be zero?
19309|13583|postgres|36081|computing stats|45|45|0|0|4|4|36093
19309|13583|postgres|36081|finalizing analyze|45|45|0|0|4|4|36093 <== there are many finalizing phases
19309|13583|postgres|36081|finalizing analyze|45|45|0|0|4|4|36093
19309|13583|postgres|36084|acquiring sample rows|45|3|0|0|0|0|0
19309|13583|postgres|36084|acquiring sample rows|45|33|0|0|0|0|0
19309|13583|postgres|36084|computing stats|45|45|0|0|0|0|0
19309|13583|postgres|36087|acquiring sample rows|45|15|0|0|0|0|0
19309|13583|postgres|36087|computing stats|45|45|0|0|0|0|0
19309|13583|postgres|36087|finalizing analyze|45|45|0|0|0|0|0 <== same as above
19309|13583|postgres|36090|acquiring sample rows|45|11|0|0|0|0|0
19309|13583|postgres|36090|acquiring sample rows|45|41|0|0|0|0|0
19309|13583|postgres|36090|finalizing analyze|45|45|0|0|0|0|0 <== same as above
19309|13583|postgres|36093|acquiring sample rows|45|7|0|0|0|0|0
19309|13583|postgres|36093|acquiring sample rows|45|37|0|0|0|0|0
19309|13583|postgres|36093|finalizing analyze|45|45|0|0|0|0|0 <== same as above
---------------------------------------------------------

Thanks,
Tatsuro Yamada

Attachments:

v10-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v10-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a3c5f86b7e..9564fad294 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
@@ -3502,7 +3510,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3948,6 +3956,164 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+    <row>
+     <entry><structfield>datid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which this backend is connected.</entry>
+    </row>
+    <row>
+     <entry><structfield>datname</structfield></entry>
+     <entry><type>name</type></entry>
+     <entry>Name of the database to which this backend is connected.</entry>
+    </row>
+    <row>
+     <entry><structfield>relid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the table being analyzed.</entry>
+    </row>
+    <row>
+     <entry><structfield>phase</structfield></entry>
+     <entry><type>text</type></entry>
+     <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+    </row>
+    <row>
+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>sample_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>ext_stats_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of extended stats.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>ext_stats_computed</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of computed extended stats.  This counter only advances when the phase
+       is <literal>computing extended stats</literal>.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>child_tables_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of child tables.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>child_tables_done</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of analyzed child tables.  This counter only advances when the phase
+       is <literal>computing extended stats</literal>.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>current_child_table_relid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the child table currently being scanned.
+       It might be different from relid when analyzing tables that have child tables.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>current_relid</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring inherited sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the <structfield>current_child_table_relid</structfield>
+       to obtain samples.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing stats</literal></entry>
+     <entry>
+       The command is computing stats from the samples obtained during the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples obtained in the previous phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f7800f01a6..afe45e0737 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -969,6 +969,27 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'acquiring sample rows'
+                      WHEN 2 THEN 'acquiring inherited sample rows'
+                      WHEN 3 THEN 'computing stats'
+                      WHEN 4 THEN 'computing extended stats'
+                      WHEN 5 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param2 AS sample_blks_total,
+		S.param3 AS sample_blks_scanned,
+        S.param4 AS ext_stats_total,
+        S.param5 AS ext_stats_computed,
+        S.param6 AS child_tables_total,
+        S.param7 AS child_tables_done,
+        CAST(S.param8 AS oid) AS current_child_table_relid
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 71372ceb16..1be98e7cb3 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -507,13 +512,23 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 */
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
+	{
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ACQUIRE_INHERITED_SAMPLE_ROWS);
+
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
 												rows, targrows,
 												&totalrows, &totaldeadrows);
+	}
 	else
+	{
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS);
+
 		numrows = (*acquirefunc) (onerel, elevel,
 								  rows, targrows,
 								  &totalrows, &totaldeadrows);
+	}
 
 	/*
 	 * Compute the statistics.  Temporary results during the calculations for
@@ -524,7 +539,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -592,10 +610,17 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * not for relations representing inheritance trees.
 		 */
 		if (!inh)
+		{
+			pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+										 PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS);
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 									   attr_cnt, vacattrstats);
+		}
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1034,6 +1059,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
 
 	Assert(targrows > 0);
 
@@ -1043,7 +1070,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+
+	/* Report sampling block numbers */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
+								 nblocks);
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1104,6 +1136,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -1332,6 +1367,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	 * rels have radically different free-space percentages, but it's not
 	 * clear that it's worth working harder.)
 	 */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_CHILD_TABLES_TOTAL,
+								 nrels);
 	numrows = 0;
 	*totalrows = 0;
 	*totaldeadrows = 0;
@@ -1341,6 +1378,9 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
 		double		childblocks = relblocks[i];
 
+		pgstat_progress_update_param(PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID,
+									 RelationGetRelid(childrel));
+
 		if (childblocks > 0)
 		{
 			int			childtargrows;
@@ -1396,6 +1436,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		 * pointers to their TOAST tables in the sampled rows.
 		 */
 		table_close(childrel, NoLock);
+		pgstat_progress_update_param(PROGRESS_ANALYZE_CHILD_TABLES_DONE,
+									 i + 1);
 	}
 
 	return numrows;
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 9d339433f6..5bbdf6d820 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -24,10 +24,12 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
+#include "commands/progress.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
@@ -92,6 +94,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	List	   *stats;
 	MemoryContext cxt;
 	MemoryContext oldcxt;
+	int64		ext_cnt;
 
 	cxt = AllocSetContextCreate(CurrentMemoryContext,
 								"BuildRelationExtStatistics",
@@ -101,6 +104,11 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
 	stats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
 
+	/* for reporting progress */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_STATS_TOTAL,
+								 list_length(stats));
+
+	ext_cnt = 0;
 	foreach(lc, stats)
 	{
 		StatExtEntry *stat = (StatExtEntry *) lfirst(lc);
@@ -165,6 +173,10 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 		/* store the statistics in the catalog */
 		statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
+
+		/* for reporting progress */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_STATS_COMPUTED,
+									 ++ext_cnt);
 	}
 
 	table_close(pg_stext, RowExclusiveLock);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..db2cc5c316 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537979..f7daece5ee 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313cb3..c29c1e00fc 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,23 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE						0
+#define PROGRESS_ANALYZE_BLOCKS_TOTAL				1
+#define PROGRESS_ANALYZE_BLOCKS_DONE				2
+#define PROGRESS_ANALYZE_EXT_STATS_TOTAL			3
+#define PROGRESS_ANALYZE_EXT_STATS_COMPUTED			4
+#define PROGRESS_ANALYZE_CHILD_TABLES_TOTAL			5
+#define PROGRESS_ANALYZE_CHILD_TABLES_DONE			6
+#define PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID	7
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS				1
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_INHERITED_SAMPLE_ROWS	2
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS					3
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS				4
+#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE					5
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..df418af4e7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -955,6 +955,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507fb5..76d31dc126 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index c9cc569404..af5e27fe6c 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1847,6 +1847,28 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
   WHERE (s.client_port IS NOT NULL);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'acquiring sample rows'::text
+            WHEN 2 THEN 'acquiring inherited sample rows'::text
+            WHEN 3 THEN 'computing stats'::text
+            WHEN 4 THEN 'computing extended stats'::text
+            WHEN 5 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param2 AS sample_blks_total,
+    s.param3 AS sample_blks_scanned,
+    s.param4 AS ext_stats_total,
+    s.param5 AS ext_stats_computed,
+    s.param6 AS child_tables_total,
+    s.param7 AS child_tables_done,
+    (s.param8)::oid AS current_child_table_relid
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#50

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tatsuro Yamada (#49)

Re: progress report for ANALYZE

Yamada-san,

On Fri, Nov 29, 2019 at 5:45 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

Attached patch is the revised patch. :)

I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See <Test result> bellow.

In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change. I'm not sure his
point applies to this case too though.

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?

<Document>
---------------------------------------------------------
<entry><literal>finalizing analyze</literal></entry>
<entry>
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end.
---------------------------------------------------------

When a partitioned table is analyzed, its partitions are analyzed too.
So, the ANALYZE command effectively runs N + 1 times if there are N
partitions -- first analyze partitioned table to collect "inherited"
statistics by collecting row samples using
acquire_inherited_sample_rows(), then each partition to collect its
own statistics. Note that this recursive application to ANALYZE to
partitions (child tables) only occurs for partitioned tables, not for
legacy inheritance.

Thanks,
Amit

#51

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Amit Langote (#50)

Re: progress report for ANALYZE

Hi Amit-san,

Thanks for your comments!

Attached patch is the revised patch. :)

I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See <Test result> bellow.

In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change. I'm not sure his
point applies to this case too though.

Yeah, I understood.
I'll check target relid of "computing stats" to re-read a code of
analyze command later. :)

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?

<Document>
---------------------------------------------------------
<entry><literal>finalizing analyze</literal></entry>
<entry>
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end.
---------------------------------------------------------

When a partitioned table is analyzed, its partitions are analyzed too.
So, the ANALYZE command effectively runs N + 1 times if there are N
partitions -- first analyze partitioned table to collect "inherited"
statistics by collecting row samples using
acquire_inherited_sample_rows(), then each partition to collect its
own statistics. Note that this recursive application to ANALYZE to
partitions (child tables) only occurs for partitioned tables, not for
legacy inheritance.

Thanks for your explanation.
I understand Analyzing Partitioned table a little.
Below is my understanding. Is it right?

==================================================
In the case of partitioned table (N = 3)

- Partitioned table name: p (relid is 100)
- Partitioning table names: p1, p2, p3 (relids are 201, 202 and 203)

For now, We can get the following results by executing "analyze p;".

Num Phase relid current_child_table_relid
1 acquire inherited sample rows 100 201
2 acquire inherited sample rows 100 202
3 acquire inherited sample rows 100 203
4 computing stats 100 0
5 finalizing analyze 100 0

6 acquiring sample rows 201 0
7 computing stats 201 0
8 finalizing analyze 201 0

9 acquiring sample rows 202 0
10 computing stats 202 0
11 finalizing analyze 202 0

12 acquiring sample rows 203 0
13 computing stats 203 0
14 finalizing analyze 203 0
==================================================

Thanks,
Tatsuro Yamada

#52

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Tatsuro Yamada (#51)

Re: progress report for ANALYZE

Hi Amit-san,

I wonder two things below. What do you think?

1)
For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See <Test result> bellow.

In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change. I'm not sure his
point applies to this case too though.

Yeah, I understood.
I'll check target relid of "computing stats" to re-read a code of
analyze command later. :)

Finally, I understood after investigation of the code. :)
Call stack is the following, and analyze_rel() calls "N + 1" times
for partitioned table and each partitions.

analyze_rel start
do_analyze_rel inh==true start
onerel: hoge2
acq_inh_sample_rows start
childrel: hoge2_10000
childrel: hoge2_20000
childrel: hoge2_30000
childrel: hoge2_default
acq_inh_sample_rows end
compute_stats start
compute_stats end
compute_index_stats start
compute_index_stats end
finalizing start
finalizing end
do_analyze_rel inh==true end
analyze_rel end
...

Also, I checked my test result. ("//" is my comments)

# select relid,
current_child_table_relid,
phase,
sample_blks_total,
sample_blks_scanned,
ext_stats_total,
ext_stats_computed,
child_tables_total,
child_tables_done
from pg_stat_progress_analyze; \watch 0.00001

== for partitioned table hoge2 ==
//hoge2_10000
36081|36084|acquiring inherited sample rows|45|20|0|0|4|0
36081|36084|acquiring inherited sample rows|45|42|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0
36081|36084|acquiring inherited sample rows|45|45|0|0|4|0

//hoge2_20000
36081|36087|acquiring inherited sample rows|45|3|0|0|4|1
36081|36087|acquiring inherited sample rows|45|31|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1
36081|36087|acquiring inherited sample rows|45|45|0|0|4|1

//hoge2_30000
36081|36090|acquiring inherited sample rows|45|12|0|0|4|2
36081|36090|acquiring inherited sample rows|45|35|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2
36081|36090|acquiring inherited sample rows|45|45|0|0|4|2

//hoge2_default
36081|36093|acquiring inherited sample rows|45|18|0|0|4|3
36081|36093|acquiring inherited sample rows|45|38|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3
36081|36093|acquiring inherited sample rows|45|45|0|0|4|3

//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.
//I guessd that user think which relid is the target of
//"computing stats"?!
//Of course, other option is to write it on document.

36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|computing stats |45|45|0|0|4|4
36081|36093|finalizing analyze |45|45|0|0|4|4

== for each partitions such as hoge2_10000 ... hoge2_default ==

//hoge2_10000
36084|0|acquiring sample rows |45|25|0|0|0|0
36084|0|computing stats |45|45|0|0|0|0
36084|0|computing extended stats|45|45|0|0|0|0
36084|0|finalizing analyze |45|45|0|0|0|0

//hoge2_20000
36087|0|acquiring sample rows |45|14|0|0|0|0
36087|0|computing stats |45|45|0|0|0|0
36087|0|computing extended stats|45|45|0|0|0|0
36087|0|finalizing analyze |45|45|0|0|0|0

//hoge2_30000
36090|0|acquiring sample rows |45|12|0|0|0|0
36090|0|acquiring sample rows |45|44|0|0|0|0
36090|0|computing extended stats|45|45|0|0|0|0
36090|0|finalizing analyze |45|45|0|0|0|0

//hoge2_default
36093|0|acquiring sample rows |45|10|0|0|0|0
36093|0|acquiring sample rows |45|43|0|0|0|0
36093|0|computing extended stats|45|45|0|0|0|0
36093|0|finalizing analyze |45|45|0|0|0|0

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?

<Document>
---------------------------------------------------------
       <entry><literal>finalizing analyze</literal></entry>
       <entry>
         The command is updating pg_class. When this phase is completed,
         <command>ANALYZE</command> will end.
---------------------------------------------------------

When a partitioned table is analyzed, its partitions are analyzed too.
So, the ANALYZE command effectively runs N + 1 times if there are N
partitions -- first analyze partitioned table to collect "inherited"
statistics by collecting row samples using
acquire_inherited_sample_rows(), then each partition to collect its
own statistics. Note that this recursive application to ANALYZE to
partitions (child tables) only occurs for partitioned tables, not for
legacy inheritance.

Thanks for your explanation.
I understand Analyzing Partitioned table a little.

It would be better to modify the document of "finalizing analyze" phase.

# Before modify
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end.

# Modified
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end. In the case of partitioned table,
it might be shown on each partitions.

What do you think that? I'm going to fix it, if you agreed. :)

Thanks,
Tatsuro Yamada

#53

Amit Langote

amitlangote09@gmail.com

about 6 years ago

In reply to: Tatsuro Yamada (#52)

Re: progress report for ANALYZE

Yamada-san,

On Fri, Dec 6, 2019 at 3:24 PM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:

1)

For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See <Test result> bellow.

In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change. I'm not sure his
point applies to this case too though.

//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.

On second thought, maybe we should not reset, because that might be
considered loss of information. To avoid confusion, we should simply
document that current_child_table_relid is only valid during the
"acquiring inherited sample rows" phase.

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?

It would be better to modify the document of "finalizing analyze" phase.

# Before modify
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end.

# Modified
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end. In the case of partitioned table,
it might be shown on each partitions.

What do you think that? I'm going to fix it, if you agreed. :)

*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes. ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition. So, I don't think we need to clarify that if only in
one phase's description. Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables. Like this:

<note>
<para>
Note that when <command>ANALYZE</command> is run on a partitioned table,
all of its partitions are also recursively analyzed as also mentioned on
<xref linkend="sql-analyze"/>. In that case, <command>ANALYZE</command>
progress is reported first for the parent table, whereby its inheritance
statistics are collected, followed by that for each partition.
</para>
</note>

Some more comments on the documentation:

+       Number of computed extended stats.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.

Number of computer extended stats -> Number of extended stats computed

+       Number of analyzed child tables.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.

Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE. Also, the 2nd sentence is wrong -- you
meant "when the phase is <literal>acquiring inherited sample
rows</literal>. I suggest to write this as follows:

Number of child tables scanned. This counter only advances when the phase
is <literal>acquiring inherited sample rows</literal>.

+     <entry>OID of the child table currently being scanned.
+       It might be different from relid when analyzing tables that
have child tables.

I suggest:

OID of the child table currently being scanned. This field is only valid when
the phase is <literal>computing extended stats</literal>.

+       The command is currently scanning the
<structfield>current_relid</structfield>
+       to obtain samples.

I suggest:

The command is currently scanning the the table given by
<structfield>current_relid</structfield> to obtain sample rows.

+       The command is currently scanning the
<structfield>current_child_table_relid</structfield>
+       to obtain samples.

I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows. Columns
<structfield>child_tables_total</structfield>,
<structfield>child_tables_done</structfield>, and
<structfield>current_child_table_relid</structfield> contain the progress
information for this phase.

+    <row>
+     <entry><literal>computing stats</literal></entry>

I think the phase name should really be "computing statistics", that
is, use the full word.

+     <entry>
+       The command is computing stats from the samples obtained
during the table scan.
+     </entry>
+    </row>

So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan

+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples
obtained in the previous phase.
+     </entry>

I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.

Thanks,
Amit

#54

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Amit Langote (#53)

Re: progress report for ANALYZE

Hi Amit-san,

1)

For now, I'm not sure it should be set current_child_table_relid to zero
when the current phase is changed from "acquiring inherited sample rows" to
"computing stats". See <Test result> bellow.

In the upthread discussion [1], Robert asked to *not* do such things,
that is, resetting some values due to phase change. I'm not sure his
point applies to this case too though.

//Below "computing stats" is for the partitioned table hoge,
//therefore the second column from the left side would be
//better to set Zero to easy to understand.

On second thought, maybe we should not reset, because that might be
considered loss of information. To avoid confusion, we should simply
document that current_child_table_relid is only valid during the
"acquiring inherited sample rows" phase.

Okay, agreed. :)

2)
There are many "finalizing analyze" phases based on relids in the case
of partitioning tables. Would it better to fix the document? or it
would be better to reduce it to one?

It would be better to modify the document of "finalizing analyze" phase.

# Before modify
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end.

# Modified
The command is updating pg_class. When this phase is completed,
<command>ANALYZE</command> will end. In the case of partitioned table,
it might be shown on each partitions.

What do you think that? I'm going to fix it, if you agreed. :)

*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes. ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition. So, I don't think we need to clarify that if only in
one phase's description. Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables. Like this:

<note>
<para>
Note that when <command>ANALYZE</command> is run on a partitioned table,
all of its partitions are also recursively analyzed as also mentioned on
<xref linkend="sql-analyze"/>. In that case, <command>ANALYZE</command>
progress is reported first for the parent table, whereby its inheritance
statistics are collected, followed by that for each partition.
</para>
</note>

Ah.. you are right: All phases are repeated, it shouldn't be fixed
the only one phase's description.

Some more comments on the documentation:
+       Number of computed extended stats.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.
Number of computed extended stats -> Number of extended stats computed

Will fix.

+       Number of analyzed child tables.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.
Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE. Also, the 2nd sentence is wrong -- you
meant "when the phase is <literal>acquiring inherited sample
rows</literal>. I suggest to write this as follows:

Number of child tables scanned. This counter only advances when the phase
is <literal>acquiring inherited sample rows</literal>.

Oops, I will fix it. :)

+     <entry>OID of the child table currently being scanned.
+       It might be different from relid when analyzing tables that
have child tables.
I suggest:

OID of the child table currently being scanned. This field is only valid when
the phase is <literal>computing extended stats</literal>.

Will fix.

+       The command is currently scanning the
<structfield>current_relid</structfield>
+       to obtain samples.
I suggest:

The command is currently scanning the the table given by
<structfield>current_relid</structfield> to obtain sample rows.

Will fix.

+       The command is currently scanning the
<structfield>current_child_table_relid</structfield>
+       to obtain samples.
I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows. Columns
<structfield>child_tables_total</structfield>,
<structfield>child_tables_done</structfield>, and
<structfield>current_child_table_relid</structfield> contain the progress
information for this phase.

Will fix.

+    <row>
+     <entry><literal>computing stats</literal></entry>
I think the phase name should really be "computing statistics", that
is, use the full word.

Will fix.

+     <entry>
+       The command is computing stats from the samples obtained
during the table scan.
+     </entry>
+    </row>
So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan

Will fix.

+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples
obtained in the previous phase.
+     </entry>
I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.

Will fix.

Thanks for your above useful suggestions. It helps me a lot. :)

Cheers!
Tatsuro Yamada

#55

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

about 6 years ago

In reply to: Tatsuro Yamada (#54)

1 attachment(s)

Re: progress report for ANALYZE

Hi All,

*All* phases are repeated in this case, not not just "finalizing
analyze", because ANALYZE repeatedly runs for each partition after the
parent partitioned table's ANALYZE finishes. ANALYZE's documentation
mentions that analyzing a partitioned table also analyzes all of its
partitions, so users should expect to see the progress information for
each partition. So, I don't think we need to clarify that if only in
one phase's description. Maybe we can add a note after the phase
description table which mentions this implementation detail about
partitioned tables. Like this:

   <note>
    <para>
     Note that when <command>ANALYZE</command> is run on a partitioned table,
     all of its partitions are also recursively analyzed as also mentioned on
     <xref linkend="sql-analyze"/>. In that case, <command>ANALYZE</command>
     progress is reported first for the parent table, whereby its inheritance
     statistics are collected, followed by that for each partition.
    </para>
   </note>

Ah.. you are right: All phases are repeated, it shouldn't be fixed
the only one phase's description.
Some more comments on the documentation:
+       Number of computed extended stats.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.
Number of computed extended stats -> Number of extended stats computed
Will fix.
+       Number of analyzed child tables.  This counter only advances
when the phase
+       is <literal>computing extended stats</literal>.
Regarding, "Number of analyzed child table", note that we don't
"analyze" child tables in this phase, only scan its blocks to collect
samples for parent's ANALYZE. Also, the 2nd sentence is wrong -- you
meant "when the phase is <literal>acquiring inherited sample
rows</literal>. I suggest to write this as follows:

Number of child tables scanned. This counter only advances when the phase
is <literal>acquiring inherited sample rows</literal>.
Oops, I will fix it. :)
+     <entry>OID of the child table currently being scanned.
+       It might be different from relid when analyzing tables that
have child tables.
I suggest:

OID of the child table currently being scanned. This field is only valid when
the phase is <literal>computing extended stats</literal>.
Will fix.
+       The command is currently scanning the
<structfield>current_relid</structfield>
+       to obtain samples.
I suggest:

The command is currently scanning the the table given by
<structfield>current_relid</structfield> to obtain sample rows.
Will fix.
+       The command is currently scanning the
<structfield>current_child_table_relid</structfield>
+       to obtain samples.
I suggest (based on phase description pg_stat_progress_create_index
phase descriptions):

The command is currently scanning child tables to obtain sample rows. Columns
<structfield>child_tables_total</structfield>,
<structfield>child_tables_done</structfield>, and
<structfield>current_child_table_relid</structfield> contain the progress
information for this phase.
Will fix.
+    <row>
+     <entry><literal>computing stats</literal></entry>
I think the phase name should really be "computing statistics", that
is, use the full word.
Will fix.
+     <entry>
+       The command is computing stats from the samples obtained
during the table scan.
+     </entry>
+    </row>
So I suggest:

The command is computing statistics from the sample rows obtained during
the table scan
Will fix.
+     <entry><literal>computing extended stats</literal></entry>
+     <entry>
+       The command is computing extended stats from the samples
obtained in the previous phase.
+     </entry>
I suggest:

The command is computing extended statistics from the sample rows obtained
during the table scan.
Will fix.

I fixed the document based on Amit's comments. :)
Please find attached file.

Thanks,
Tatsuro Yamadas

Attachments:

v11-Report-progress-for-ANALYZE.patchtext/plain; charset=UTF-8; name=v11-Report-progress-for-ANALYZE.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index a3c5f86b7e..a5da126b8f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -360,6 +360,14 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
       </entry>
      </row>
 
+     <row>
+      <entry><structname>pg_stat_progress_analyze</structname><indexterm><primary>pg_stat_progress_analyze</primary></indexterm></entry>
+      <entry>One row for each backend (including autovacuum worker processes) running
+       <command>ANALYZE</command>, showing current progress.
+       See <xref linkend='analyze-progress-reporting'/>.
+      </entry>
+     </row>
+
      <row>
       <entry><structname>pg_stat_progress_cluster</structname><indexterm><primary>pg_stat_progress_cluster</primary></indexterm></entry>
       <entry>One row for each backend running
@@ -3502,7 +3510,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
    <productname>PostgreSQL</productname> has the ability to report the progress of
    certain commands during command execution.  Currently, the only commands
    which support progress reporting are <command>CREATE INDEX</command>,
-   <command>VACUUM</command> and
+   <command>VACUUM</command>, <command>ANALYZE</command> and
    <command>CLUSTER</command>. This may be expanded in the future.
   </para>
 
@@ -3948,6 +3956,179 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
  </sect2>
 
+ <sect2 id="analyze-progress-reporting">
+  <title>ANALYZE Progress Reporting</title>
+
+  <para>
+   Whenever <command>ANALYZE</command> is running, the
+   <structname>pg_stat_progress_analyze</structname> view will contain a
+   row for each backend that is currently running that command.  The tables
+   below describe the information that will be reported and provide
+   information about how to interpret it.
+  </para>
+
+  <table id="pg-stat-progress-analyze-view" xreflabel="pg_stat_progress_analyze">
+   <title><structname>pg_stat_progress_analyze</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>pid</structfield></entry>
+     <entry><type>integer</type></entry>
+     <entry>Process ID of backend.</entry>
+    </row>
+    <row>
+     <entry><structfield>datid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the database to which this backend is connected.</entry>
+    </row>
+    <row>
+     <entry><structfield>datname</structfield></entry>
+     <entry><type>name</type></entry>
+     <entry>Name of the database to which this backend is connected.</entry>
+    </row>
+    <row>
+     <entry><structfield>relid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the table being analyzed.</entry>
+    </row>
+    <row>
+     <entry><structfield>phase</structfield></entry>
+     <entry><type>text</type></entry>
+     <entry>Current processing phase. See <xref linkend="analyze-phases" /></entry>
+    </row>
+    <row>
+     <entry><structfield>sample_blks_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Total number of heap blocks that will be sampled.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>sample_blks_scanned</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of heap blocks scanned.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>ext_stats_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of extended statistics.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>ext_stats_computed</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of computed extended statistics computed.  This counter only advances when
+       the phase is <literal>computing extended statistics</literal>.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>child_tables_total</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of child tables.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>child_tables_done</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>
+       Number of child tables scanned.  This counter only advances when the phase
+       is <literal>acquiring inherited sample rows</literal>.
+     </entry>
+    </row>
+    <row>
+     <entry><structfield>current_child_table_relid</structfield></entry>
+     <entry><type>oid</type></entry>
+     <entry>OID of the child table currently being scanned. This field is only valid when
+       the phase is <literal>computing extended statistics</literal>.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <table id="analyze-phases">
+   <title>ANALYZE phases</title>
+   <tgroup cols="2">
+    <thead>
+    <row>
+      <entry>Phase</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+   <tbody>
+    <row>
+     <entry><literal>initializing</literal></entry>
+     <entry>
+       The command is preparing to begin scanning the heap.  This phase is
+       expected to be very brief.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring sample rows</literal></entry>
+     <entry>
+       The command is currently scanning the table given by
+       <structfield>current_relid</structfield> to obtain sample rows.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>acquiring inherited sample rows</literal></entry>
+     <entry>
+       The command is currently scanning child tables to obtain sample rows. Columns
+       <structfield>child_tables_total</structfield>,
+       <structfield>child_tables_done</structfield>, and
+       <structfield>current_child_table_relid</structfield> contain the progress
+       information for this phase.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing statistics</literal></entry>
+     <entry>
+       The command is computing statistics from the samples rows obtained during
+       the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>computing extended statistics</literal></entry>
+     <entry>
+       The command is computing extended statistics from the samples rows obtained
+       durring the table scan.
+     </entry>
+    </row>
+    <row>
+     <entry><literal>finalizing analyze</literal></entry>
+     <entry>
+       The command is updating pg_class. When this phase is completed, 
+       <command>ANALYZE</command> will end.
+     </entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <note>
+   <para>
+    Note that when <command>ANALYZE</command> is run on a partitioned table,
+    all of its partitions are also recursively analyzed as also mentioned on
+    <xref linkend="sql-analyze"/>.  In that case, <command>ANALYZE</command>
+    progress is reported first for the parent table, whereby its inheritance
+    statistics are collected, followed by that for each partition.
+   </para>
+  </note>
+ </sect2>
+
  <sect2 id="cluster-progress-reporting">
   <title>CLUSTER Progress Reporting</title>
 
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f7800f01a6..40195de1a5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -969,6 +969,27 @@ CREATE VIEW pg_stat_progress_vacuum AS
     FROM pg_stat_get_progress_info('VACUUM') AS S
         LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_analyze AS
+    SELECT
+        S.pid AS pid, S.datid AS datid, D.datname AS datname,
+        CAST(S.relid AS oid) AS relid,
+        CASE S.param1 WHEN 0 THEN 'initializing'
+                      WHEN 1 THEN 'acquiring sample rows'
+                      WHEN 2 THEN 'acquiring inherited sample rows'
+                      WHEN 3 THEN 'computing statistics'
+                      WHEN 4 THEN 'computing extended statistics'
+                      WHEN 5 THEN 'finalizing analyze'
+                      END AS phase,
+        S.param2 AS sample_blks_total,
+        S.param3 AS sample_blks_scanned,
+        S.param4 AS ext_stats_total,
+        S.param5 AS ext_stats_computed,
+        S.param6 AS child_tables_total,
+        S.param7 AS child_tables_done,
+        CAST(S.param8 AS oid) AS current_child_table_relid
+    FROM pg_stat_get_progress_info('ANALYZE') AS S
+        LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_stat_progress_cluster AS
     SELECT
         S.pid AS pid,
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 71372ceb16..187e619170 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -35,6 +35,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_statistic_ext.h"
 #include "commands/dbcommands.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "executor/executor.h"
@@ -251,6 +252,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
 	MyPgXact->vacuumFlags |= PROC_IN_ANALYZE;
 	LWLockRelease(ProcArrayLock);
+	pgstat_progress_start_command(PROGRESS_COMMAND_ANALYZE,
+								  RelationGetRelid(onerel));
 
 	/*
 	 * Do the normal non-recursive ANALYZE.  We can skip this for partitioned
@@ -275,6 +278,8 @@ analyze_rel(Oid relid, RangeVar *relation,
 	 */
 	relation_close(onerel, NoLock);
 
+	pgstat_progress_end_command();
+
 	/*
 	 * Reset my PGXACT flag.  Note: we need this here, and not in vacuum_rel,
 	 * because the vacuum flag is cleared by the end-of-xact code.
@@ -507,13 +512,23 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	 */
 	rows = (HeapTuple *) palloc(targrows * sizeof(HeapTuple));
 	if (inh)
+	{
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ACQUIRE_INHERITED_SAMPLE_ROWS);
+
 		numrows = acquire_inherited_sample_rows(onerel, elevel,
 												rows, targrows,
 												&totalrows, &totaldeadrows);
+	}
 	else
+	{
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS);
+
 		numrows = (*acquirefunc) (onerel, elevel,
 								  rows, targrows,
 								  &totalrows, &totaldeadrows);
+	}
 
 	/*
 	 * Compute the statistics.  Temporary results during the calculations for
@@ -524,7 +539,10 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 	if (numrows > 0)
 	{
 		MemoryContext col_context,
-					old_context;
+					  old_context;
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+									 PROGRESS_ANALYZE_PHASE_COMPUTE_STATS);
 
 		col_context = AllocSetContextCreate(anl_context,
 											"Analyze Column",
@@ -592,10 +610,18 @@ do_analyze_rel(Relation onerel, VacuumParams *params,
 		 * not for relations representing inheritance trees.
 		 */
 		if (!inh)
+		{
+			pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+										 PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS);
+
 			BuildRelationExtStatistics(onerel, totalrows, numrows, rows,
 									   attr_cnt, vacattrstats);
+		}
 	}
 
+	pgstat_progress_update_param(PROGRESS_ANALYZE_PHASE,
+								 PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE);
+
 	/*
 	 * Update pages/tuples stats in pg_class ... but not if we're doing
 	 * inherited stats.
@@ -1034,6 +1060,8 @@ acquire_sample_rows(Relation onerel, int elevel,
 	ReservoirStateData rstate;
 	TupleTableSlot *slot;
 	TableScanDesc scan;
+	BlockNumber	nblocks;
+	BlockNumber	blksdone = 0;
 
 	Assert(targrows > 0);
 
@@ -1043,7 +1071,12 @@ acquire_sample_rows(Relation onerel, int elevel,
 	OldestXmin = GetOldestXmin(onerel, PROCARRAY_FLAGS_VACUUM);
 
 	/* Prepare for sampling block numbers */
-	BlockSampler_Init(&bs, totalblocks, targrows, random());
+	nblocks = BlockSampler_Init(&bs, totalblocks, targrows, random());
+
+	/* Report sampling block numbers */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_TOTAL,
+								 nblocks);
+
 	/* Prepare for sampling rows */
 	reservoir_init_selection_state(&rstate, targrows);
 
@@ -1104,6 +1137,9 @@ acquire_sample_rows(Relation onerel, int elevel,
 
 			samplerows += 1;
 		}
+
+		pgstat_progress_update_param(PROGRESS_ANALYZE_BLOCKS_DONE,
+									 ++blksdone);
 	}
 
 	ExecDropSingleTupleTableSlot(slot);
@@ -1332,6 +1368,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 	 * rels have radically different free-space percentages, but it's not
 	 * clear that it's worth working harder.)
 	 */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_CHILD_TABLES_TOTAL,
+								 nrels);
 	numrows = 0;
 	*totalrows = 0;
 	*totaldeadrows = 0;
@@ -1341,6 +1379,9 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		AcquireSampleRowsFunc acquirefunc = acquirefuncs[i];
 		double		childblocks = relblocks[i];
 
+		pgstat_progress_update_param(PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID,
+									 RelationGetRelid(childrel));
+
 		if (childblocks > 0)
 		{
 			int			childtargrows;
@@ -1396,6 +1437,8 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 		 * pointers to their TOAST tables in the sampled rows.
 		 */
 		table_close(childrel, NoLock);
+		pgstat_progress_update_param(PROGRESS_ANALYZE_CHILD_TABLES_DONE,
+									 i + 1);
 	}
 
 	return numrows;
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 9d339433f6..5bbdf6d820 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -24,10 +24,12 @@
 #include "catalog/pg_collation.h"
 #include "catalog/pg_statistic_ext.h"
 #include "catalog/pg_statistic_ext_data.h"
+#include "commands/progress.h"
 #include "miscadmin.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
@@ -92,6 +94,7 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	List	   *stats;
 	MemoryContext cxt;
 	MemoryContext oldcxt;
+	int64		ext_cnt;
 
 	cxt = AllocSetContextCreate(CurrentMemoryContext,
 								"BuildRelationExtStatistics",
@@ -101,6 +104,11 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 	pg_stext = table_open(StatisticExtRelationId, RowExclusiveLock);
 	stats = fetch_statentries_for_relation(pg_stext, RelationGetRelid(onerel));
 
+	/* for reporting progress */
+	pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_STATS_TOTAL,
+								 list_length(stats));
+
+	ext_cnt = 0;
 	foreach(lc, stats)
 	{
 		StatExtEntry *stat = (StatExtEntry *) lfirst(lc);
@@ -165,6 +173,10 @@ BuildRelationExtStatistics(Relation onerel, double totalrows,
 
 		/* store the statistics in the catalog */
 		statext_store(stat->statOid, ndistinct, dependencies, mcv, stats);
+
+		/* for reporting progress */
+		pgstat_progress_update_param(PROGRESS_ANALYZE_EXT_STATS_COMPUTED,
+									 ++ext_cnt);
 	}
 
 	table_close(pg_stext, RowExclusiveLock);
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..db2cc5c316 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -469,6 +469,8 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 	/* Translate command name into command type code. */
 	if (pg_strcasecmp(cmd, "VACUUM") == 0)
 		cmdtype = PROGRESS_COMMAND_VACUUM;
+	else if (pg_strcasecmp(cmd, "ANALYZE") == 0)
+		cmdtype = PROGRESS_COMMAND_ANALYZE;
 	else if (pg_strcasecmp(cmd, "CLUSTER") == 0)
 		cmdtype = PROGRESS_COMMAND_CLUSTER;
 	else if (pg_strcasecmp(cmd, "CREATE INDEX") == 0)
diff --git a/src/backend/utils/misc/sampling.c b/src/backend/utils/misc/sampling.c
index d2a1537979..f7daece5ee 100644
--- a/src/backend/utils/misc/sampling.c
+++ b/src/backend/utils/misc/sampling.c
@@ -32,8 +32,10 @@
  * Since we know the total number of blocks in advance, we can use the
  * straightforward Algorithm S from Knuth 3.4.2, rather than Vitter's
  * algorithm.
+ *
+ * Returns the number of blocks that BlockSampler_Next will return.
  */
-void
+BlockNumber
 BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 				  long randseed)
 {
@@ -48,6 +50,8 @@ BlockSampler_Init(BlockSampler bs, BlockNumber nblocks, int samplesize,
 	bs->m = 0;					/* blocks selected so far */
 
 	sampler_random_init_state(randseed, bs->randstate);
+
+	return Min(bs->n, bs->N);
 }
 
 bool
diff --git a/src/include/commands/progress.h b/src/include/commands/progress.h
index acd1313cb3..c29c1e00fc 100644
--- a/src/include/commands/progress.h
+++ b/src/include/commands/progress.h
@@ -34,6 +34,23 @@
 #define PROGRESS_VACUUM_PHASE_TRUNCATE			5
 #define PROGRESS_VACUUM_PHASE_FINAL_CLEANUP		6
 
+/* Progress parameters for analyze */
+#define PROGRESS_ANALYZE_PHASE						0
+#define PROGRESS_ANALYZE_BLOCKS_TOTAL				1
+#define PROGRESS_ANALYZE_BLOCKS_DONE				2
+#define PROGRESS_ANALYZE_EXT_STATS_TOTAL			3
+#define PROGRESS_ANALYZE_EXT_STATS_COMPUTED			4
+#define PROGRESS_ANALYZE_CHILD_TABLES_TOTAL			5
+#define PROGRESS_ANALYZE_CHILD_TABLES_DONE			6
+#define PROGRESS_ANALYZE_CURRENT_CHILD_TABLE_RELID	7
+
+/* Phases of analyze (as advertised via PROGRESS_ANALYZE_PHASE) */
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_SAMPLE_ROWS				1
+#define PROGRESS_ANALYZE_PHASE_ACQUIRE_INHERITED_SAMPLE_ROWS	2
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_STATS					3
+#define PROGRESS_ANALYZE_PHASE_COMPUTE_EXT_STATS				4
+#define PROGRESS_ANALYZE_PHASE_FINALIZE_ANALYZE					5
+
 /* Progress parameters for cluster */
 #define PROGRESS_CLUSTER_COMMAND				0
 #define PROGRESS_CLUSTER_PHASE					1
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..df418af4e7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -955,6 +955,7 @@ typedef enum ProgressCommandType
 {
 	PROGRESS_COMMAND_INVALID,
 	PROGRESS_COMMAND_VACUUM,
+	PROGRESS_COMMAND_ANALYZE,
 	PROGRESS_COMMAND_CLUSTER,
 	PROGRESS_COMMAND_CREATE_INDEX
 } ProgressCommandType;
diff --git a/src/include/utils/sampling.h b/src/include/utils/sampling.h
index 541b507fb5..76d31dc126 100644
--- a/src/include/utils/sampling.h
+++ b/src/include/utils/sampling.h
@@ -37,7 +37,7 @@ typedef struct
 
 typedef BlockSamplerData *BlockSampler;
 
-extern void BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
+extern BlockNumber BlockSampler_Init(BlockSampler bs, BlockNumber nblocks,
 							  int samplesize, long randseed);
 extern bool BlockSampler_HasMore(BlockSampler bs);
 extern BlockNumber BlockSampler_Next(BlockSampler bs);
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 80a07825b9..eb21983a55 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1847,6 +1847,28 @@ pg_stat_gssapi| SELECT s.pid,
     s.gss_enc AS encrypted
    FROM pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event, xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid, backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial, ssl_issuer_dn, gss_auth, gss_princ, gss_enc)
   WHERE (s.client_port IS NOT NULL);
+pg_stat_progress_analyze| SELECT s.pid,
+    s.datid,
+    d.datname,
+    s.relid,
+        CASE s.param1
+            WHEN 0 THEN 'initializing'::text
+            WHEN 1 THEN 'acquiring sample rows'::text
+            WHEN 2 THEN 'acquiring inherited sample rows'::text
+            WHEN 3 THEN 'computing statistics'::text
+            WHEN 4 THEN 'computing extended statistics'::text
+            WHEN 5 THEN 'finalizing analyze'::text
+            ELSE NULL::text
+        END AS phase,
+    s.param2 AS sample_blks_total,
+    s.param3 AS sample_blks_scanned,
+    s.param4 AS ext_stats_total,
+    s.param5 AS ext_stats_computed,
+    s.param6 AS child_tables_total,
+    s.param7 AS child_tables_done,
+    (s.param8)::oid AS current_child_table_relid
+   FROM (pg_stat_get_progress_info('ANALYZE'::text) s(pid, datid, relid, param1, param2, param3, param4, param5, param6, param7, param8, param9, param10, param11, param12, param13, param14, param15, param16, param17, param18, param19, param20)
+     LEFT JOIN pg_database d ON ((s.datid = d.oid)));
 pg_stat_progress_cluster| SELECT s.pid,
     s.datid,
     d.datname,

#56

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Tatsuro Yamada (#55)

Re: progress report for ANALYZE

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#57

Justin Pryzby

pryzby@telsasoft.com

almost 6 years ago

In reply to: Alvaro Herrera (#56)

1 attachment(s)

Re: progress report for ANALYZE

On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

Find attached minor fixes to docs - sorry I didn't look earlier.

Possibly you'd also want to change the other existing instances of "preparing
to begin".

Attachments:

v1-0001-Doc-review-for-ANALYZE-progress-a166d408.patchtext/x-diff; charset=us-asciiDownload

From de108e69b5d33c881074b0a04697d7061684f823 Mon Sep 17 00:00:00 2001
From: Justin Pryzby <pryzbyj@telsasoft.com>
Date: Wed, 15 Jan 2020 23:10:29 -0600
Subject: [PATCH v1] Doc review for ANALYZE progress (a166d408)

---
 doc/src/sgml/monitoring.sgml | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 8b44fb1..10871b7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3525,7 +3525,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   <para>
    Whenever <command>ANALYZE</command> is running, the
    <structname>pg_stat_progress_analyze</structname> view will contain a
-   row for each backend that is currently running that command.  The tables
+   row for each backend currently running <command>ANALYZE</command>.  The tables
    below describe the information that will be reported and provide
    information about how to interpret it.
   </para>
@@ -3635,7 +3635,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
     <row>
      <entry><literal>initializing</literal></entry>
      <entry>
-       The command is preparing to begin scanning the heap.  This phase is
+       The command is preparing to scan the heap.  This phase is
        expected to be very brief.
      </entry>
     </row>
@@ -3643,7 +3643,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
      <entry><literal>acquiring sample rows</literal></entry>
      <entry>
        The command is currently scanning the table given by
-       <structfield>current_relid</structfield> to obtain sample rows.
+       <structfield>relid</structfield> to obtain sample rows.
      </entry>
     </row>
     <row>
@@ -3659,14 +3659,14 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
     <row>
      <entry><literal>computing statistics</literal></entry>
      <entry>
-       The command is computing statistics from the samples rows obtained during
+       The command is computing statistics from the sample rows obtained during
        the table scan.
      </entry>
     </row>
     <row>
      <entry><literal>computing extended statistics</literal></entry>
      <entry>
-       The command is computing extended statistics from the samples rows obtained
+       The command is computing extended statistics from the sample rows obtained
        durring the table scan.
      </entry>
     </row>
-- 
2.7.4

#58

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

almost 6 years ago

In reply to: Alvaro Herrera (#56)

Re: progress report for ANALYZE

Hi Alvaro and All reviewers,

On 2020/01/16 2:11, Alvaro Herrera wrote:

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

Thanks for reviewing and committing the patch!
Hope this helps DBA. :-D

P.S.
Next up is progress reporting for query execution?!
To create it, I guess that it needs to improve
progress reporting infrastructure.

Thanks,
Tatsuro Yamada

#59

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Justin Pryzby (#57)

Re: progress report for ANALYZE

On Fri, Jan 17, 2020 at 12:19 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

Find attached minor fixes to docs - sorry I didn't look earlier.

Possibly you'd also want to change the other existing instances of "preparing
to begin".

Spotted a few other issues with the docs:

+ Number of computed extended statistics computed.

Should be: "Number of extended statistics computed."

+     <entry>OID of the child table currently being scanned. This
field is only valid when
+       the phase is <literal>computing extended statistics</literal>.

Should be: "This field is only valid when the phase is
<literal>acquiring inherited sample rows</literal>."

+ durring the table scan.

during

Attached a patch.

Thanks,
Amit

#60

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Amit Langote (#59)

1 attachment(s)

Re: progress report for ANALYZE

On Wed, Jan 22, 2020 at 2:52 PM Amit Langote <amitlangote09@gmail.com> wrote:

On Fri, Jan 17, 2020 at 12:19 AM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Wed, Jan 15, 2020 at 02:11:10PM -0300, Alvaro Herrera wrote:

I just pushed this after some small extra tweaks.

Thanks, Yamada-san, for seeing this to completion!

Find attached minor fixes to docs - sorry I didn't look earlier.

Possibly you'd also want to change the other existing instances of "preparing
to begin".

Spotted a few other issues with the docs:

+ Number of computed extended statistics computed.

Should be: "Number of extended statistics computed."
+     <entry>OID of the child table currently being scanned. This
field is only valid when
+       the phase is <literal>computing extended statistics</literal>.
Should be: "This field is only valid when the phase is
<literal>acquiring inherited sample rows</literal>."

+ durring the table scan.

during

Attached a patch.

Oops, really attached this time.

Thanks,
Amit

Attachments:

a166d408eb0-doc-fixes.patchtext/plain; charset=US-ASCII; name=a166d408eb0-doc-fixes.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0bfd6151c4..67fbd9660b 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3592,7 +3592,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
      <entry><structfield>ext_stats_computed</structfield></entry>
      <entry><type>bigint</type></entry>
      <entry>
-       Number of computed extended statistics computed.  This counter only advances when
+       Number of extended statistics computed.  This counter only advances when
        the phase is <literal>computing extended statistics</literal>.
      </entry>
     </row>
@@ -3615,7 +3615,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
      <entry><structfield>current_child_table_relid</structfield></entry>
      <entry><type>oid</type></entry>
      <entry>OID of the child table currently being scanned. This field is only valid when
-       the phase is <literal>computing extended statistics</literal>.
+       the phase is <literal>acquiring inherited sample rows</literal>.
      </entry>
     </row>
    </tbody>
@@ -3667,7 +3667,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
      <entry><literal>computing extended statistics</literal></entry>
      <entry>
        The command is computing extended statistics from the samples rows obtained
-       durring the table scan.
+       during the table scan.
      </entry>
     </row>
     <row>

#61

Michael Paquier

michael@paquier.xyz

almost 6 years ago

In reply to: Amit Langote (#60)

Re: progress report for ANALYZE

On Wed, Jan 22, 2020 at 03:06:52PM +0900, Amit Langote wrote:

Oops, really attached this time.

Thanks, applied. There were clearly two grammar mistakes in the first
patch sent by Justin. And your suggestions look fine to me. On top
of that, I have noticed that the indentation of the two tables in the
docs was rather inconsistent.
--
Michael

#62

Alvaro Herrera

alvherre@2ndquadrant.com

almost 6 years ago

In reply to: Tatsuro Yamada (#58)

Re: progress report for ANALYZE

On 2020-Jan-22, Tatsuro Yamada wrote:

Thanks for reviewing and committing the patch!
Hope this helps DBA. :-D

I'm sure it does!

P.S.
Next up is progress reporting for query execution?!

Actually, I think it's ALTER TABLE.

Also, things like VACUUM could report the progress of the index being
processed ...

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#63

Amit Langote

amitlangote09@gmail.com

almost 6 years ago

In reply to: Alvaro Herrera (#62)

Re: progress report for ANALYZE

On Fri, Jan 24, 2020 at 6:47 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Jan-22, Tatsuro Yamada wrote:

P.S.
Next up is progress reporting for query execution?!

Actually, I think it's ALTER TABLE.

+1. Existing infrastructure might be enough to cover ALTER TABLE's
needs, whereas we will very likely need to build entirely different
infrastructure for tracking the progress for SQL query execution.

Thanks,
Amit

#64

Tatsuro Yamada

tatsuro.yamada.tf@nttcom.co.jp

almost 6 years ago

In reply to: Amit Langote (#63)

Re: progress report for ANALYZE

Hi,

On 2020/01/24 23:44, Amit Langote wrote:

On Fri, Jan 24, 2020 at 6:47 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

On 2020-Jan-22, Tatsuro Yamada wrote:

P.S.
Next up is progress reporting for query execution?!

Actually, I think it's ALTER TABLE.

+1. Existing infrastructure might be enough to cover ALTER TABLE's
needs, whereas we will very likely need to build entirely different
infrastructure for tracking the progress for SQL query execution.

Yeah, I agree.
I will create a little POC patch after reading tablecmds.c and alter.c to
investigate how to report progress. :)

Regards,
Tatsuro Yamada