mat views stats

Started by Jim Mlodgenskialmost 9 years ago12 messages
#1Jim Mlodgenski
jimmy76@gmail.com
1 attachment(s)

I've come across a number of times where the statistics on materialized
views become stale producing bad plans. It turns out that autovacuum only
touches a materialized view when it is first created and ignores it on a
refresh. When you have a materialized view like yesterdays_sales the data
in the materialized view turns over every day.

Attached is a patch to trigger autovacuum based on a matview refresh along
with a system view pg_stat_all_matviews to show information more meaningful
for materialized views.

-- Jim

Attachments:

stat_matview.patchtext/x-patch; charset=US-ASCII; name=stat_matview.patchDownload
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index fad5cb0..ec27e2c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -436,6 +436,21 @@ postgres   27093  0.0  0.0  30096  2752 ?        Ss   11:34   0:00 postgres: ser
      </row>
 
      <row>
+      <entry><structname>pg_stat_all_matviews</><indexterm><primary>pg_stat_all_matviews</primary></indexterm></entry>
+      <entry>
+       One row for each materialized view in the current database, showing statistics
+       about accesses to that specific materialized view.
+       See <xref linkend="pg-stat-all-matviews-view"> for details.
+      </entry>
+     </row>
+
+     <row>
+      <entry><structname>pg_stat_user_matviews</><indexterm><primary>pg_stat_user_matviews</primary></indexterm></entry>
+      <entry>Same as <structname>pg_stat_all_matviews</>, except that only
+      user materialized views are shown.</entry>
+     </row>
+
+     <row>
       <entry><structname>pg_statio_all_tables</><indexterm><primary>pg_statio_all_tables</primary></indexterm></entry>
       <entry>
        One row for each table in the current database, showing statistics
@@ -2277,6 +2292,97 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
    </para>
   </note>
 
+  <table id="pg-stat-all-matviews-view" xreflabel="pg_stat_all_matviews">
+   <title><structname>pg_stat_all_matviews</structname> View</title>
+   <tgroup cols="3">
+    <thead>
+    <row>
+      <entry>Column</entry>
+      <entry>Type</entry>
+      <entry>Description</entry>
+     </row>
+    </thead>
+
+   <tbody>
+    <row>
+     <entry><structfield>relid</></entry>
+     <entry><type>oid</></entry>
+     <entry>OID of this materialize view</entry>
+    </row>
+    <row>
+     <entry><structfield>schemaname</></entry>
+     <entry><type>name</></entry>
+     <entry>Name of the schema that this materialized view is in</entry>
+    </row>
+    <row>
+     <entry><structfield>relname</></entry>
+     <entry><type>name</></entry>
+     <entry>Name of this materialized view</entry>
+    </row>
+    <row>
+     <entry><structfield>seq_scan</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of sequential scans initiated on this materialized view</entry>
+    </row>
+    <row>
+     <entry><structfield>seq_tup_read</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of live rows fetched by sequential scans</entry>
+    </row>
+    <row>
+     <entry><structfield>idx_scan</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of index scans initiated on this materialized ciew</entry>
+    </row>
+    <row>
+     <entry><structfield>idx_tup_fetch</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of live rows fetched by index scans</entry>
+    </row>
+    <row>
+     <entry><structfield>last_refresh</></entry>
+     <entry><type>timestamp with time zone</></entry>
+     <entry>Last time at which this materialized view was refreshed</entry>
+    </row>
+    <row>
+     <entry><structfield>refresh_count</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of times this materialized view has been refreshed</entry>
+    </row>
+    <row>
+     <entry><structfield>last_analyze</></entry>
+     <entry><type>timestamp with time zone</></entry>
+     <entry>Last time at which this materialized view was manually analyzed</entry>
+    </row>
+    <row>
+     <entry><structfield>last_autoanalyze</></entry>
+     <entry><type>timestamp with time zone</></entry>
+     <entry>Last time at which this materialized ciew was analyzed by the
+      autovacuum daemon</entry>
+    </row>
+    <row>
+     <entry><structfield>analyze_count</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of times this materialized view has been manually analyzed</entry>
+    </row>
+    <row>
+     <entry><structfield>autoanalyze_count</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Number of times this materialized view has been analyzed by the
+      autovacuum daemon</entry>
+    </row>
+   </tbody>
+   </tgroup>
+  </table>
+
+  <para>
+   The <structname>pg_stat_all_matviews</structname> view will contain
+   one row for each materialized view in the current database, showing 
+   statistics about accesses to that specific materialized view. The
+   <structname>pg_stat_user_matviews</structname> contain the same 
+   information, but filtered to only show user materialized views.
+  </para>
+
   <table id="pg-statio-all-tables-view" xreflabel="pg_statio_all_tables">
    <title><structname>pg_statio_all_tables</structname> View</title>
    <tgroup cols="3">
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 816483e..8d60d48 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -164,6 +164,15 @@ static relopt_int intRelOpts[] =
 		},
 		-1, 0, INT_MAX
 	},
+        {
+                {
+                        "autovacuum_analyze_refresh_threshold",
+                        "Minimum number of materialized view refreshes prior to analyze",
+                        RELOPT_KIND_HEAP,
+                        ShareUpdateExclusiveLock
+                },
+                -1, 0, INT_MAX
+        },
 	{
 		{
 			"autovacuum_vacuum_cost_delay",
@@ -1283,6 +1292,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_threshold)},
 		{"autovacuum_analyze_threshold", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_threshold)},
+                {"autovacuum_analyze_refresh_threshold", RELOPT_TYPE_INT,
+                offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, analyze_ref_threshold)},
 		{"autovacuum_vacuum_cost_delay", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) +offsetof(AutoVacOpts, vacuum_cost_delay)},
 		{"autovacuum_vacuum_cost_limit", RELOPT_TYPE_INT,
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 38be9cf..07b9f1c 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -535,6 +535,28 @@ CREATE VIEW pg_stat_xact_all_tables AS
     WHERE C.relkind IN ('r', 't', 'm')
     GROUP BY C.oid, N.nspname, C.relname;
 
+CREATE VIEW pg_stat_all_matviews AS
+    SELECT
+            C.oid AS relid,
+            N.nspname AS schemaname,
+            C.relname AS relname,
+            pg_stat_get_numscans(C.oid) AS seq_scan,
+            pg_stat_get_tuples_returned(C.oid) AS seq_tup_read,
+            sum(pg_stat_get_numscans(I.indexrelid))::bigint AS idx_scan,
+            sum(pg_stat_get_tuples_fetched(I.indexrelid))::bigint +
+            pg_stat_get_tuples_fetched(C.oid) AS idx_tup_fetch,
+            pg_stat_get_last_refresh_time(C.oid) as last_refresh,
+            pg_stat_get_refresh_count(C.oid) AS refresh_count,
+            pg_stat_get_last_analyze_time(C.oid) as last_analyze,
+            pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
+            pg_stat_get_analyze_count(C.oid) AS analyze_count,
+            pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count
+    FROM pg_class C LEFT JOIN
+         pg_index I ON C.oid = I.indrelid
+         LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
+    WHERE C.relkind = 'm'
+    GROUP BY C.oid, N.nspname, C.relname;
+
 CREATE VIEW pg_stat_sys_tables AS
     SELECT * FROM pg_stat_all_tables
     WHERE schemaname IN ('pg_catalog', 'information_schema') OR
@@ -555,6 +577,11 @@ CREATE VIEW pg_stat_xact_user_tables AS
     WHERE schemaname NOT IN ('pg_catalog', 'information_schema') AND
           schemaname !~ '^pg_toast';
 
+CREATE VIEW pg_stat_user_matviews AS
+    SELECT * FROM pg_stat_all_matviews
+    WHERE schemaname NOT IN ('pg_catalog', 'information_schema') AND
+          schemaname !~ '^pg_toast';
+
 CREATE VIEW pg_statio_all_tables AS
     SELECT
             C.oid AS relid,
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a18c917..b4ad098 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -320,6 +320,9 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	SetUserIdAndSecContext(relowner,
 						   save_sec_context | SECURITY_RESTRICTED_OPERATION);
 
+	/* report the refresh to the stats collector */
+	pgstat_report_refresh(matviewOid);
+
 	/* Generate the data, if wanted. */
 	if (!stmt->skipData)
 		refresh_matview_datafill(dest, dataQuery, queryString);
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 0c5ffa0..42f7a1d 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -115,6 +115,7 @@ int			autovacuum_naptime;
 int			autovacuum_vac_thresh;
 double		autovacuum_vac_scale;
 int			autovacuum_anl_thresh;
+int			autovacuum_anl_ref_thresh;
 double		autovacuum_anl_scale;
 int			autovacuum_freeze_max_age;
 int			autovacuum_multixact_freeze_max_age;
@@ -2757,7 +2758,8 @@ relation_needs_vacanalyze(Oid relid,
 
 	/* constants from reloptions or GUC variables */
 	int			vac_base_thresh,
-				anl_base_thresh;
+				anl_base_thresh,
+				anl_ref_thresh;
 	float4		vac_scale_factor,
 				anl_scale_factor;
 
@@ -2768,6 +2770,7 @@ relation_needs_vacanalyze(Oid relid,
 	/* number of vacuum (resp. analyze) tuples at this time */
 	float4		vactuples,
 				anltuples;
+	int		anlrefresh;
 
 	/* freeze parameters */
 	int			freeze_max_age;
@@ -2801,6 +2804,10 @@ relation_needs_vacanalyze(Oid relid,
 		? relopts->analyze_threshold
 		: autovacuum_anl_thresh;
 
+        anl_ref_thresh = (relopts && relopts->analyze_ref_threshold >= 0)
+                ? relopts->analyze_ref_threshold
+                : autovacuum_anl_ref_thresh;
+
 	freeze_max_age = (relopts && relopts->freeze_max_age >= 0)
 		? Min(relopts->freeze_max_age, autovacuum_freeze_max_age)
 		: autovacuum_freeze_max_age;
@@ -2848,6 +2855,7 @@ relation_needs_vacanalyze(Oid relid,
 		reltuples = classForm->reltuples;
 		vactuples = tabentry->n_dead_tuples;
 		anltuples = tabentry->changes_since_analyze;
+		anlrefresh = tabentry->refreshes_since_analyze;
 
 		vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
 		anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
@@ -2861,9 +2869,13 @@ relation_needs_vacanalyze(Oid relid,
 			 NameStr(classForm->relname),
 			 vactuples, vacthresh, anltuples, anlthresh);
 
-		/* Determine if this table needs vacuum or analyze. */
+		/* Determine if this table or materialzed view  needs vacuum or analyze. */
 		*dovacuum = force_vacuum || (vactuples > vacthresh);
 		*doanalyze = (anltuples > anlthresh);
+
+		/* Check if a materialized view was refreshed and needs to be analyzed */
+		if (!*doanalyze && classForm->relkind == RELKIND_MATVIEW)
+			*doanalyze = (anlrefresh >= anl_ref_thresh);	
 	}
 	else
 	{
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index ada374c..2f86409 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -294,6 +294,7 @@ static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, in
 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
 static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
+static void pgstat_recv_refresh(PgStat_MsgRefresh *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
@@ -1400,6 +1401,28 @@ pgstat_report_analyze(Relation rel,
 	pgstat_send(&msg, sizeof(msg));
 }
 
+/* ---------
+ * pgstat_report_refresh() -
+ *
+ *      Tell the collector about the materialed view we just refreshed.
+ * ---------
+ */
+void
+pgstat_report_refresh(Oid tableoid)
+{
+	PgStat_MsgRefresh msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_REFRESH);
+	msg.m_databaseid = MyDatabaseId;
+	msg.m_tableoid = tableoid;
+	msg.m_refreshtime = GetCurrentTimestamp();
+	pgstat_send(&msg, sizeof(msg));
+}
+
+
 /* --------
  * pgstat_report_recovery_conflict() -
  *
@@ -3878,6 +3901,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_analyze((PgStat_MsgAnalyze *) &msg, len);
 					break;
 
+				case PGSTAT_MTYPE_REFRESH:
+					pgstat_recv_refresh((PgStat_MsgRefresh *) &msg, len);
+					break;
+
 				case PGSTAT_MTYPE_ARCHIVER:
 					pgstat_recv_archiver((PgStat_MsgArchiver *) &msg, len);
 					break;
@@ -4101,6 +4128,9 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
 		result->analyze_count = 0;
 		result->autovac_analyze_timestamp = 0;
 		result->autovac_analyze_count = 0;
+		result->refresh_timestamp = 0;
+		result->refresh_count = 0;
+		result->refreshes_since_analyze = 0;
 	}
 
 	return result;
@@ -5211,6 +5241,9 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 			tabentry->analyze_count = 0;
 			tabentry->autovac_analyze_timestamp = 0;
 			tabentry->autovac_analyze_count = 0;
+			tabentry->refresh_timestamp = 0;
+			tabentry->refresh_count = 0;
+			tabentry->refreshes_since_analyze = 0;
 		}
 		else
 		{
@@ -5507,8 +5540,11 @@ pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len)
 	 * have no good way to estimate how many of those there were.
 	 */
 	if (msg->m_resetcounter)
+	{
 		tabentry->changes_since_analyze = 0;
-
+		tabentry->refreshes_since_analyze = 0;
+	}
+	
 	if (msg->m_autovacuum)
 	{
 		tabentry->autovac_analyze_timestamp = msg->m_analyzetime;
@@ -5521,6 +5557,29 @@ pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len)
 	}
 }
 
+/* ----------
+ * pgstat_recv_refresh() -
+ *
+ *      Process a REFRESH message.
+ * ----------
+ */
+static void
+pgstat_recv_refresh(PgStat_MsgRefresh *msg, int len)
+{
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry *tabentry;
+
+	/*
+	 * Store the data in the mview's hashtable entry.
+	 */
+	dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+	tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
+
+	tabentry->refresh_timestamp = msg->m_refreshtime;
+	tabentry->refresh_count++;
+	tabentry->refreshes_since_analyze++;
+}
 
 /* ----------
  * pgstat_recv_archiver() -
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index a987d0d..13f77ad 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -296,6 +296,24 @@ pg_stat_get_last_autoanalyze_time(PG_FUNCTION_ARGS)
 }
 
 Datum
+pg_stat_get_last_refresh_time(PG_FUNCTION_ARGS)
+{
+	Oid                     relid = PG_GETARG_OID(0);
+	TimestampTz result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = tabentry->refresh_timestamp;
+
+	if (result == 0)
+		PG_RETURN_NULL();
+	else
+		PG_RETURN_TIMESTAMPTZ(result);
+}
+
+Datum
 pg_stat_get_vacuum_count(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
@@ -356,6 +374,21 @@ pg_stat_get_autoanalyze_count(PG_FUNCTION_ARGS)
 }
 
 Datum
+pg_stat_get_refresh_count(PG_FUNCTION_ARGS)
+{
+        Oid                     relid = PG_GETARG_OID(0);
+        int64           result;
+        PgStat_StatTabEntry *tabentry;
+
+        if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+                result = 0;
+        else
+                result = (int64) (tabentry->refresh_count);
+
+        PG_RETURN_INT64(result);
+}
+
+Datum
 pg_stat_get_function_calls(PG_FUNCTION_ARGS)
 {
 	Oid			funcid = PG_GETARG_OID(0);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 5d8fb2e..4ec36c0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2632,6 +2632,15 @@ static struct config_int ConfigureNamesInt[] =
 		50, 0, INT_MAX,
 		NULL, NULL, NULL
 	},
+        {
+                {"autovacuum_analyze_refresh_threshold", PGC_SIGHUP, AUTOVACUUM,
+                        gettext_noop("Minimum number of materialized view refreshes prior to analyze."),
+                        NULL
+                },
+                &autovacuum_anl_ref_thresh,
+                1, 0, INT_MAX,
+                NULL, NULL, NULL
+        },
 	{
 		/* see varsup.c for why this is PGC_POSTMASTER not PGC_SIGHUP */
 		{"autovacuum_freeze_max_age", PGC_POSTMASTER, AUTOVACUUM,
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index bb7053a..1ed669a 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2766,6 +2766,10 @@ DATA(insert OID = 3056 ( pg_stat_get_analyze_count PGNSP PGUID 12 1 0 0 0 f f f
 DESCR("statistics: number of manual analyzes for a table");
 DATA(insert OID = 3057 ( pg_stat_get_autoanalyze_count PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_autoanalyze_count _null_ _null_ _null_ ));
 DESCR("statistics: number of auto analyzes for a table");
+DATA(insert OID = 3353 (  pg_stat_get_last_refresh_time PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 1184 "26" _null_ _null_ _null_ _null_ _null_  pg_stat_get_last_refresh_time _null_ _null_ _null_ ));
+DESCR("statistics: last refresh time for a materialized view");
+DATA(insert OID = 3354 ( pg_stat_get_refresh_count PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_refresh_count _null_ _null_ _null_ ));
+DESCR("statistics: number of manual vacuums for a table");
 DATA(insert OID = 1936 (  pg_stat_get_backend_idset		PGNSP PGUID 12 1 100 0 0 f f f f t t s r 0 0 23 "" _null_ _null_ _null_ _null_ _null_ pg_stat_get_backend_idset _null_ _null_ _null_ ));
 DESCR("statistics: currently active backend IDs");
 DATA(insert OID = 2022 (  pg_stat_get_activity			PGNSP PGUID 12 1 100 0 0 f f f f f t s r 1 0 2249 "23" "{23,26,23,26,25,25,25,25,25,1184,1184,1184,1184,869,25,23,28,28,16,25,25,23,16,25}" "{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}" "{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,ssl,sslversion,sslcipher,sslbits,sslcompression,sslclientdn}" _null_ _null_ pg_stat_get_activity _null_ _null_ _null_ ));
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 8b710ec..4b87e38 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -58,6 +58,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_AUTOVAC_START,
 	PGSTAT_MTYPE_VACUUM,
 	PGSTAT_MTYPE_ANALYZE,
+        PGSTAT_MTYPE_REFRESH,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
 	PGSTAT_MTYPE_FUNCSTAT,
@@ -391,6 +392,19 @@ typedef struct PgStat_MsgAnalyze
 
 
 /* ----------
+ * PgStat_MsgRefresh                    Sent by the backend after REFRESH 
+ * ----------
+ */
+typedef struct PgStat_MsgRefresh
+{
+        PgStat_MsgHdr m_hdr;
+        Oid                     m_databaseid;
+        Oid                     m_tableoid;
+        TimestampTz m_refreshtime;
+} PgStat_MsgRefresh;
+
+
+/* ----------
  * PgStat_MsgArchiver			Sent by the archiver to update statistics.
  * ----------
  */
@@ -549,6 +563,7 @@ typedef union PgStat_Msg
 	PgStat_MsgAutovacStart msg_autovacuum;
 	PgStat_MsgVacuum msg_vacuum;
 	PgStat_MsgAnalyze msg_analyze;
+        PgStat_MsgRefresh msg_refresh;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
 	PgStat_MsgFuncstat msg_funcstat;
@@ -641,6 +656,9 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter analyze_count;
 	TimestampTz autovac_analyze_timestamp;		/* autovacuum initiated */
 	PgStat_Counter autovac_analyze_count;
+        TimestampTz refresh_timestamp;
+        PgStat_Counter refresh_count;
+	PgStat_Counter refreshes_since_analyze;
 } PgStat_StatTabEntry;
 
 
diff --git a/src/include/postmaster/autovacuum.h b/src/include/postmaster/autovacuum.h
index 99d7f09..e910c76 100644
--- a/src/include/postmaster/autovacuum.h
+++ b/src/include/postmaster/autovacuum.h
@@ -23,6 +23,7 @@ extern int	autovacuum_naptime;
 extern int	autovacuum_vac_thresh;
 extern double autovacuum_vac_scale;
 extern int	autovacuum_anl_thresh;
+extern int	autovacuum_anl_ref_thresh;
 extern double autovacuum_anl_scale;
 extern int	autovacuum_freeze_max_age;
 extern int	autovacuum_multixact_freeze_max_age;
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index a617a7c..0555aae 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -256,6 +256,7 @@ typedef struct AutoVacOpts
 	bool		enabled;
 	int			vacuum_threshold;
 	int			analyze_threshold;
+	int			analyze_ref_threshold;
 	int			vacuum_cost_delay;
 	int			vacuum_cost_limit;
 	int			freeze_min_age;
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9f876ae..d4517ab 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -1549,6 +1549,7 @@ PgStat_MsgFuncstat
 PgStat_MsgHdr
 PgStat_MsgInquiry
 PgStat_MsgRecoveryConflict
+PgStat_MsgRefresh
 PgStat_MsgResetcounter
 PgStat_MsgResetsharedcounter
 PgStat_MsgResetsinglecounter
#2Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Jim Mlodgenski (#1)
Re: mat views stats

On 2/20/17 10:06, Jim Mlodgenski wrote:

I've come across a number of times where the statistics on materialized
views become stale producing bad plans. It turns out that autovacuum
only touches a materialized view when it is first created and ignores it
on a refresh. When you have a materialized view like yesterdays_sales
the data in the materialized view turns over every day.

That sounds like a bug.

Attached is a patch to trigger autovacuum based on a matview refresh
along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

I think these should be two separate patches. We might want to
backpatch the first one.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#3Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#2)
Re: mat views stats

On 2/21/17 4:22 PM, Peter Eisentraut wrote:

Attached is a patch to trigger autovacuum based on a matview refresh
along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

Certainly easier, but I don't think it'd be better. Matviews really
aren't the same thing as tables. Off-hand (without reviewing the patch),
update and delete counts certainly wouldn't make any sense. "Insert"
counts might, in as much as it's how many rows have been added by
refreshes. You'd want a refresh count too.

I think these should be two separate patches. We might want to
backpatch the first one.

+1; definitely sounds like a bug to me.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Jim Mlodgenski
jimmy76@gmail.com
In reply to: Jim Nasby (#3)
Re: mat views stats

On Wed, Feb 22, 2017 at 12:43 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

On 2/21/17 4:22 PM, Peter Eisentraut wrote:

Attached is a patch to trigger autovacuum based on a matview refresh

along with a system view pg_stat_all_matviews to show information more
meaningful for materialized views.

It might be easier to include materialized views into pg_stat_*_tables.

Certainly easier, but I don't think it'd be better. Matviews really aren't
the same thing as tables. Off-hand (without reviewing the patch), update
and delete counts certainly wouldn't make any sense. "Insert" counts might,
in as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Matviews already show up in the pg_stat_*_tables and the patch does
leverage the existing pg_stat_*_tables underlying structure, but it creates
more meaningful pg_stat_*_matviews leaving out things like insert and
update counts.

I think these should be two separate patches. We might want to

backpatch the first one.

I was originally thinking 2 patches, but I couldn't think of a way to
trigger the analyze reliably without adding a refresh count or sending
bogus stats. We can certainly send a stats message containing the number of
rows inserted by the refresh, but are we going to also send the number of
deletes as well? Consider a matview that has month to date data. At the end
of the month, there will be about 30n live tuples. The next day on the new
month, there will be n inserts with the stats thinking there are 30n live
tuples which is below the analyze scale factor. We want to analyze the
matview on the first of the day of the new month, but it wouldn't be
triggered for a few days. We can have REFRESH also track live tuples, but
it was quickly becoming a slippery slope of changing behavior for a back
patch. Maybe that's OK and we can go down that road.

We can back patch some documentation about the existing refresh behavior
with autovacuum.

#5Peter Eisentraut
peter.eisentraut@2ndquadrant.com
In reply to: Jim Mlodgenski (#4)
Re: mat views stats

On 2/22/17 06:31, Jim Mlodgenski wrote:

Matviews already show up in the pg_stat_*_tables and the patch does
leverage the existing pg_stat_*_tables underlying structure, but it
creates more meaningful pg_stat_*_matviews leaving out things like
insert and update counts.

But fields like seq_scans and last_analyze are then redundant between
the *_tables view and the *_matviews view. Maybe it would make more
sense to introduce a new view like you propose and not show them in
*_tables anymore?

I was originally thinking 2 patches, but I couldn't think of a way to
trigger the analyze reliably without adding a refresh count or sending
bogus stats. We can certainly send a stats message containing the number
of rows inserted by the refresh, but are we going to also send the
number of deletes as well? Consider a matview that has month to date
data. At the end of the month, there will be about 30n live tuples. The
next day on the new month, there will be n inserts with the stats
thinking there are 30n live tuples which is below the analyze scale
factor. We want to analyze the matview on the first of the day of the
new month, but it wouldn't be triggered for a few days. We can have
REFRESH also track live tuples, but it was quickly becoming a slippery
slope of changing behavior for a back patch. Maybe that's OK and we can
go down that road.

For those not reading the patch, it introduces a new reloption
autovacuum_analyze_refresh_threshold that determines when to autoanalyze
a materialized view.

What behavior would we like by default? Refreshing a materialized view
is a pretty expensive operation, so I think scheduling an analyze quite
aggressively right afterwards is often what you want.

I think sending a stats message with the number of inserted rows could
make sense.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Jim Nasby
Jim.Nasby@BlueTreble.com
In reply to: Peter Eisentraut (#5)
Re: mat views stats

On 2/22/17 7:56 AM, Peter Eisentraut wrote:

What behavior would we like by default? Refreshing a materialized view
is a pretty expensive operation, so I think scheduling an analyze quite
aggressively right afterwards is often what you want.

I think sending a stats message with the number of inserted rows could
make sense.

+1 on both counts. And if sane analyze behavior does depend on the stats 
changes then there's no real advantage to a separate patch.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Jim Nasby (#3)
Re: mat views stats

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

Certainly easier, but I don't think it'd be better. Matviews really aren't
the same thing as tables. Off-hand (without reviewing the patch), update and
delete counts certainly wouldn't make any sense. "Insert" counts might, in
as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents. So I think all the same counters that make sense for
regular tables are also sensible here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Jim Mlodgenski
jimmy76@gmail.com
In reply to: Robert Haas (#7)
1 attachment(s)
Re: mat views stats

On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

Certainly easier, but I don't think it'd be better. Matviews really

aren't

the same thing as tables. Off-hand (without reviewing the patch), update

and

delete counts certainly wouldn't make any sense. "Insert" counts might,

in

as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contents. So I think all the same counters that make sense for
regular tables are also sensible here.

After digging into things further, just making refresh report the stats for
what is it basically doing simplifies and solves it and it is something we
can back patch if that the consensus. See the attached patch.

Attachments:

refresh_matview_stats.patchtext/x-patch; charset=US-ASCII; name=refresh_matview_stats.patchDownload
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a18c917..4383312 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -30,6 +30,7 @@
 #include "executor/spi.h"
 #include "miscadmin.h"
 #include "parser/parse_relation.h"
+#include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
 #include "storage/smgr.h"
@@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty
 static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
-static void refresh_matview_datafill(DestReceiver *dest, Query *query,
+static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
 						 const char *queryString);
 
 static char *make_temptable_name_n(char *tempname, int n);
@@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+        uint64          processed = 0;
 	ObjectAddress address;
 
 	/* Determine strength of lock needed. */
@@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Generate the data, if wanted. */
 	if (!stmt->skipData)
-		refresh_matview_datafill(dest, dataQuery, queryString);
+		processed = refresh_matview_datafill(dest, dataQuery, queryString);
 
 	heap_close(matviewRel, NoLock);
 
@@ -345,8 +347,17 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 		Assert(matview_maintenance_depth == old_depth);
 	}
 	else
+        {
 		refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence);
 
+                /* 
+                 * Send the stats to mimic what we are essentially doing. 
+                 * A truncate and insert 
+                 */
+                pgstat_count_truncate(matviewRel);
+                pgstat_count_heap_insert(matviewRel, processed);
+        }
+
 	/* Roll back any GUC changes */
 	AtEOXact_GUC(false, save_nestlevel);
 
@@ -361,7 +372,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 /*
  * refresh_matview_datafill
  */
-static void
+static uint64 
 refresh_matview_datafill(DestReceiver *dest, Query *query,
 						 const char *queryString)
 {
@@ -369,6 +380,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	PlannedStmt *plan;
 	QueryDesc  *queryDesc;
 	Query	   *copied_query;
+        uint64     processed;
 
 	/* Lock and rewrite, using a copy to preserve the original query. */
 	copied_query = copyObject(query);
@@ -406,6 +418,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	/* run the plan */
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 
+        processed = queryDesc->estate->es_processed;
+
 	/* and clean up */
 	ExecutorFinish(queryDesc);
 	ExecutorEnd(queryDesc);
@@ -413,6 +427,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	FreeQueryDesc(queryDesc);
 
 	PopActiveSnapshot();
+
+        return processed;
 }
 
 DestReceiver *
#9Michael Paquier
michael.paquier@gmail.com
In reply to: Jim Mlodgenski (#8)
Re: mat views stats

On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski <jimmy76@gmail.com> wrote:

On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

Certainly easier, but I don't think it'd be better. Matviews really
aren't
the same thing as tables. Off-hand (without reviewing the patch), update
and
delete counts certainly wouldn't make any sense. "Insert" counts might,
in
as much as it's how many rows have been added by refreshes. You'd want a
refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contrs that make sense for
regular tables are also sensible here.

After digging into things further, just making refresh report the stats for
what is it basically doing simplifies and solves it and it is something we
can back patch if that the consensus. See the attached patch.

This is unhappy:
$ git diff master --check
src/backend/commands/matview.c:155: indent with spaces.
+ uint64 processed = 0;

+                /*
+                 * Send the stats to mimic what we are essentially doing.
+                 * A truncate and insert
+                 */
This sentence is unfinished.

There is also no need to report the number of inserts if WITH NO DATA is used.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Jim Mlodgenski
jimmy76@gmail.com
In reply to: Michael Paquier (#9)
1 attachment(s)
Re: mat views stats

On Wed, Mar 1, 2017 at 8:39 PM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Thu, Mar 2, 2017 at 7:20 AM, Jim Mlodgenski <jimmy76@gmail.com> wrote:

On Sun, Feb 26, 2017 at 11:49 AM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Wed, Feb 22, 2017 at 11:13 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
wrote:

Certainly easier, but I don't think it'd be better. Matviews really
aren't
the same thing as tables. Off-hand (without reviewing the patch),

update

and
delete counts certainly wouldn't make any sense. "Insert" counts

might,

in
as much as it's how many rows have been added by refreshes. You'd

want a

refresh count too.

Regular REFRESH truncates the view and repopulates it, but REFRESH
CONCURRENTLY does inserts, updates, and deletes as needed to adjust
the contrs that make sense for
regular tables are also sensible here.

After digging into things further, just making refresh report the stats

for

what is it basically doing simplifies and solves it and it is something

we

can back patch if that the consensus. See the attached patch.

This is unhappy:
$ git diff master --check
src/backend/commands/matview.c:155: indent with spaces.
+ uint64 processed = 0;

+                /*
+                 * Send the stats to mimic what we are essentially doing.
+                 * A truncate and insert
+                 */
This sentence is unfinished.

There is also no need to report the number of inserts if WITH NO DATA is
used.

Here is the cleaned up patch

Attachments:

refresh_matview_stats_v2.patchtext/x-patch; charset=US-ASCII; name=refresh_matview_stats_v2.patchDownload
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index a18c917..94a69dd 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -30,6 +30,7 @@
 #include "executor/spi.h"
 #include "miscadmin.h"
 #include "parser/parse_relation.h"
+#include "pgstat.h"
 #include "rewrite/rewriteHandler.h"
 #include "storage/lmgr.h"
 #include "storage/smgr.h"
@@ -59,7 +60,7 @@ static void transientrel_startup(DestReceiver *self, int operation, TupleDesc ty
 static bool transientrel_receive(TupleTableSlot *slot, DestReceiver *self);
 static void transientrel_shutdown(DestReceiver *self);
 static void transientrel_destroy(DestReceiver *self);
-static void refresh_matview_datafill(DestReceiver *dest, Query *query,
+static uint64 refresh_matview_datafill(DestReceiver *dest, Query *query,
 						 const char *queryString);
 
 static char *make_temptable_name_n(char *tempname, int n);
@@ -151,6 +152,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	Oid			save_userid;
 	int			save_sec_context;
 	int			save_nestlevel;
+	uint64          processed = 0;
 	ObjectAddress address;
 
 	/* Determine strength of lock needed. */
@@ -322,7 +324,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Generate the data, if wanted. */
 	if (!stmt->skipData)
-		refresh_matview_datafill(dest, dataQuery, queryString);
+		processed = refresh_matview_datafill(dest, dataQuery, queryString);
 
 	heap_close(matviewRel, NoLock);
 
@@ -345,8 +347,18 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 		Assert(matview_maintenance_depth == old_depth);
 	}
 	else
+	{
 		refresh_by_heap_swap(matviewOid, OIDNewHeap, relpersistence);
 
+		/*
+		 * Send the stats to mimic what we are essentially doing. Swapping the heap
+		 * is equilivant to truncating the relation and inserting the new data.
+		 */
+		pgstat_count_truncate(matviewRel);
+		if (!stmt->skipData)
+			pgstat_count_heap_insert(matviewRel, processed);
+	}
+
 	/* Roll back any GUC changes */
 	AtEOXact_GUC(false, save_nestlevel);
 
@@ -361,7 +373,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 /*
  * refresh_matview_datafill
  */
-static void
+static uint64
 refresh_matview_datafill(DestReceiver *dest, Query *query,
 						 const char *queryString)
 {
@@ -369,6 +381,7 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	PlannedStmt *plan;
 	QueryDesc  *queryDesc;
 	Query	   *copied_query;
+	uint64     processed;
 
 	/* Lock and rewrite, using a copy to preserve the original query. */
 	copied_query = copyObject(query);
@@ -406,6 +419,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	/* run the plan */
 	ExecutorRun(queryDesc, ForwardScanDirection, 0L);
 
+	processed = queryDesc->estate->es_processed;
+
 	/* and clean up */
 	ExecutorFinish(queryDesc);
 	ExecutorEnd(queryDesc);
@@ -413,6 +428,8 @@ refresh_matview_datafill(DestReceiver *dest, Query *query,
 	FreeQueryDesc(queryDesc);
 
 	PopActiveSnapshot();
+
+	return processed;
 }
 
 DestReceiver *
#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Jim Mlodgenski (#10)
Re: mat views stats

Jim Mlodgenski <jimmy76@gmail.com> writes:

After digging into things further, just making refresh report the stats
for what is it basically doing simplifies and solves it and it is
something we can back patch if that the consensus. See the attached
patch.

I've pushed this into HEAD with one non-cosmetic change: the patch tried
to pass a uint64 tuple count to pgstat_count_heap_insert(), whose argument
was only declared as "int". This would go seriously wrong with matviews
having more than INT_MAX rows, which hardly seems out of the question,
so I changed pgstat_count_heap_insert() to take int64 instead.

I don't think we can make that change in the back branches though.
It seems too likely that third-party code might be calling
pgstat_count_heap_insert().

We could possibly kluge around this to produce a safe-to-back-patch
fix by doing something like

pgstat_count_heap_insert(matviewRel, (int) Min(processed, INT_MAX));

But that seems pretty ugly. Given the lack of previous reports, I'm
personally content to leave this unfixed in the back branches.

Comments?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Jim Mlodgenski
jimmy76@gmail.com
In reply to: Tom Lane (#11)
Re: mat views stats

But that seems pretty ugly. Given the lack of previous reports, I'm
personally content to leave this unfixed in the back branches.

Comments?

Most instances of this I've seen out in the field have worked around this

by just running analyze in the scheduled jobs after the refresh so we're
probably good not back patching.