VACUUM's ancillary tasks

Started by Vik Fearingover 9 years ago19 messages

vik@2ndquadrant.fr

over 9 years ago

2 attachment(s)

The attached two patches scratch two itches I've been having for a
while. I'm attaching them together because the second depends on the first.

Both deal with the fact that [auto]vacuum has taken on more roles than
its original purpose.

Patch One: autovacuum insert-heavy tables

If you have a table that mostly receives INSERTs, it will never get
vacuumed because there are no (or few) dead rows. I have added an
"inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
the same way as "changes_since_analyze" does.

The reason such a table needs to be vacuumed is currently twofold: the
visibility map is not updated, slowing down index-only scans; and BRIN
indexes are not maintained, rendering them basically useless.

Patch Two: autovacuum after table rewrites

This patch addresses the absurdity that a standard VACUUM is required
after a VACUUM FULL because the visibility map gets blown away. This is
also the case for CLUSTER and some versions of ALTER TABLE that rewrite
the table.

I thought about having those commands do the same work themselves, but
it seems better to have them simply trigger autovacuum than
quadruplicate the work. I do this by having them fill in the
"inserts_since_vacuum" field added in Patch One with the number of rows
rewritten. This assumes that autovacuum_vacuum_scale_factor is < 1.0
which hopefully is a safe assumption.

While doing this, I noticed that ALTER TABLE should also re-analyze the
table for obvious reasons, so I have that one set
"changes_since_analyze" to the number of rows rewritten as well.

I have not included any kind of test suite here because I don't really
have any ideas how to go about it in a sane way. Suggestions welcome.

Attention reviewer: Please note that some of the documentation in the
first patch gets removed by the second patch, in case they both don't
get committed.

I have added this to the imminent commitfest. These patches are rebased
as of 26fa446.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

Attachments:

autovac_01_v01.patchinvalid/octet-stream; name=autovac_01_v01.patchDownload

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 2713883..4881f18 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -373,6 +373,14 @@
     The visibility map is vastly smaller than the heap, so it can easily be
     cached even when the heap is very large.
    </para>
+
+   <para>
+    Note that operations that rewrite the table, such as <command>VACUUM
+    FULL</command>, <command>CLUSTER</command>, and some versions of
+    <command>ALTER TABLE</command> also reset the visibility map.
+    It is recommended to always issue a standard <command>VACUUM</command>
+    after such commands to rebuild it.
+   </para>
   </sect2>
 
   <sect2 id="vacuum-for-wraparound">
@@ -476,8 +484,7 @@
     <varname>autovacuum_freeze_max_age</> minus
     <varname>vacuum_freeze_min_age</> transactions.
     For tables that are regularly vacuumed for space reclamation purposes,
-    this is of little importance.  However, for static tables
-    (including tables that receive inserts, but no updates or deletes),
+    this is of little importance.  However, for static tables,
     there is no need to vacuum for space reclamation, so it can
     be useful to try to maximize the interval between forced autovacuums
     on very large static tables.  Obviously one can do this either by
@@ -739,7 +746,7 @@ HINT:  Stop the postmaster and vacuum that database in single-user mode.
     <xref linkend="guc-autovacuum-freeze-max-age"> transactions old are always
     vacuumed (this also applies to those tables whose freeze max age has
     been modified via storage parameters; see below).  Otherwise, if the
-    number of tuples obsoleted since the last
+    number of tuples obsoleted or inserted since the last
     <command>VACUUM</command> exceeds the <quote>vacuum threshold</quote>, the
     table is vacuumed.  The vacuum threshold is defined as:
 <programlisting>
@@ -751,10 +758,11 @@ vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuple
     <xref linkend="guc-autovacuum-vacuum-scale-factor">,
     and the number of tuples is
     <structname>pg_class</structname>.<structfield>reltuples</structfield>.
-    The number of obsolete tuples is obtained from the statistics
-    collector; it is a semi-accurate count updated by each
-    <command>UPDATE</command> and <command>DELETE</command> operation.  (It
-    is only semi-accurate because some information might be lost under heavy
+    The number of obsoleted and inserted tuples is obtained from the statistics
+    collector; they are semi-accurate counts updated by each
+    <command>INSERT</command>, <command>UPDATE</command>,
+    and <command>DELETE</command> operation.  (They are
+    only semi-accurate because some information might be lost under heavy
     load.)  If the <structfield>relfrozenxid</> value of the table is more
     than <varname>vacuum_freeze_table_age</> transactions old, an aggressive
     vacuum is performed to freeze old tuples and advance
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0776428..646e0d8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1818,6 +1818,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
      <entry>Estimated number of dead rows</entry>
     </row>
     <row>
+     <entry><structfield>n_ins_since_vacuum</></entry>
+     <entry><type>bigint</></entry>
+     <entry>Estimated number of rows inserted since this table was last vacuumed</entry>
+    </row>
+    <row>
      <entry><structfield>n_mod_since_analyze</></entry>
      <entry><type>bigint</></entry>
      <entry>Estimated number of rows modified since this table was last analyzed</entry>
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ada2142..3696faa 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -457,6 +457,7 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_tuples_hot_updated(C.oid) AS n_tup_hot_upd,
             pg_stat_get_live_tuples(C.oid) AS n_live_tup,
             pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
+            pg_stat_get_ins_since_vacuum(C.oid) AS n_ins_since_vacuum,
             pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
             pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
             pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3768f50..7ebcc05 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2681,9 +2681,10 @@ relation_needs_vacanalyze(Oid relid,
 	float4		vacthresh,
 				anlthresh;
 
-	/* number of vacuum (resp. analyze) tuples at this time */
+	/* number of vacuum/analyze/inserted tuples at this time */
 	float4		vactuples,
-				anltuples;
+				anltuples,
+				instuples;
 
 	/* freeze parameters */
 	int			freeze_max_age;
@@ -2764,6 +2765,7 @@ relation_needs_vacanalyze(Oid relid,
 		reltuples = classForm->reltuples;
 		vactuples = tabentry->n_dead_tuples;
 		anltuples = tabentry->changes_since_analyze;
+		instuples = tabentry->inserts_since_vacuum;
 
 		vacthresh = (float4) vac_base_thresh + vac_scale_factor * reltuples;
 		anlthresh = (float4) anl_base_thresh + anl_scale_factor * reltuples;
@@ -2773,12 +2775,12 @@ relation_needs_vacanalyze(Oid relid,
 		 * reset, because if that happens, the last vacuum and analyze counts
 		 * will be reset too.
 		 */
-		elog(DEBUG3, "%s: vac: %.0f (threshold %.0f), anl: %.0f (threshold %.0f)",
+		elog(DEBUG3, "%s: vac/ins: %.0f/%.0f (threshold %.0f), anl: %.0f (threshold %.0f)",
 			 NameStr(classForm->relname),
-			 vactuples, vacthresh, anltuples, anlthresh);
+			 vactuples, instuples, vacthresh, anltuples, anlthresh);
 
 		/* Determine if this table needs vacuum or analyze. */
-		*dovacuum = force_vacuum || (vactuples > vacthresh);
+		*dovacuum = force_vacuum || (vactuples > vacthresh) || (instuples > vacthresh);
 		*doanalyze = (anltuples > anlthresh);
 	}
 	else
@@ -2792,7 +2794,7 @@ relation_needs_vacanalyze(Oid relid,
 		*doanalyze = false;
 	}
 
-	/* ANALYZE refuses to work with pg_statistics */
+	/* ANALYZE refuses to work with pg_statistic */
 	if (relid == StatisticRelationId)
 		*doanalyze = false;
 }
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 8fa9edb..169be3f 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3855,6 +3855,7 @@ pgstat_get_tab_entry(PgStat_StatDBEntry *dbentry, Oid tableoid, bool create)
 		result->tuples_hot_updated = 0;
 		result->n_live_tuples = 0;
 		result->n_dead_tuples = 0;
+		result->inserts_since_vacuum = 0;
 		result->changes_since_analyze = 0;
 		result->blocks_fetched = 0;
 		result->blocks_hit = 0;
@@ -4966,6 +4967,7 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 			tabentry->tuples_hot_updated = tabmsg->t_counts.t_tuples_hot_updated;
 			tabentry->n_live_tuples = tabmsg->t_counts.t_delta_live_tuples;
 			tabentry->n_dead_tuples = tabmsg->t_counts.t_delta_dead_tuples;
+			tabentry->inserts_since_vacuum = tabmsg->t_counts.t_tuples_inserted;
 			tabentry->changes_since_analyze = tabmsg->t_counts.t_changed_tuples;
 			tabentry->blocks_fetched = tabmsg->t_counts.t_blocks_fetched;
 			tabentry->blocks_hit = tabmsg->t_counts.t_blocks_hit;
@@ -4999,6 +5001,7 @@ pgstat_recv_tabstat(PgStat_MsgTabstat *msg, int len)
 			}
 			tabentry->n_live_tuples += tabmsg->t_counts.t_delta_live_tuples;
 			tabentry->n_dead_tuples += tabmsg->t_counts.t_delta_dead_tuples;
+			tabentry->inserts_since_vacuum += tabmsg->t_counts.t_tuples_inserted;
 			tabentry->changes_since_analyze += tabmsg->t_counts.t_changed_tuples;
 			tabentry->blocks_fetched += tabmsg->t_counts.t_blocks_fetched;
 			tabentry->blocks_hit += tabmsg->t_counts.t_blocks_hit;
@@ -5234,6 +5237,12 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 	tabentry->n_live_tuples = msg->m_live_tuples;
 	tabentry->n_dead_tuples = msg->m_dead_tuples;
 
+	/*
+	 * We reset inserts_since_vacuum to zero, forgetting any changes that
+	 * occurred while the VACUUM was in progress.
+	 */
+	tabentry->inserts_since_vacuum = 0;
+
 	if (msg->m_autovacuum)
 	{
 		tabentry->autovac_vacuum_timestamp = msg->m_vacuumtime;
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 1bba5fa..982b11e 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -39,6 +39,7 @@ extern Datum pg_stat_get_tuples_deleted(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_tuples_hot_updated(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_live_tuples(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_dead_tuples(PG_FUNCTION_ARGS);
+extern Datum pg_stat_get_ins_since_vacuum(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS);
 extern Datum pg_stat_get_blocks_hit(PG_FUNCTION_ARGS);
@@ -277,6 +278,22 @@ pg_stat_get_dead_tuples(PG_FUNCTION_ARGS)
 
 
 Datum
+pg_stat_get_ins_since_vacuum(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	int64		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = (int64) (tabentry->inserts_since_vacuum);
+
+	PG_RETURN_INT64(result);
+}
+
+
+Datum
 pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
 {
 	Oid			relid = PG_GETARG_OID(0);
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index e2d08ba..3a999ce 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2739,6 +2739,8 @@ DATA(insert OID = 2878 (  pg_stat_get_live_tuples	PGNSP PGUID 12 1 0 0 0 f f f f
 DESCR("statistics: number of live tuples");
 DATA(insert OID = 2879 (  pg_stat_get_dead_tuples	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_dead_tuples _null_ _null_ _null_ ));
 DESCR("statistics: number of dead tuples");
+DATA(insert OID = 4032 (  pg_stat_get_ins_since_vacuum PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_ins_since_vacuum _null_ _null_ _null_ ));
+DESCR("statistics: number of tuples inserted since last vacuum");
 DATA(insert OID = 3177 (  pg_stat_get_mod_since_analyze PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_mod_since_analyze _null_ _null_ _null_ ));
 DESCR("statistics: number of tuples changed since last analyze");
 DATA(insert OID = 1934 (  pg_stat_get_blocks_fetched	PGNSP PGUID 12 1 0 0 0 f f f f t f s r 1 0 20 "26" _null_ _null_ _null_ _null_ _null_ pg_stat_get_blocks_fetched _null_ _null_ _null_ ));
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index dc3320d..d491c71 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -628,6 +628,7 @@ typedef struct PgStat_StatTabEntry
 
 	PgStat_Counter n_live_tuples;
 	PgStat_Counter n_dead_tuples;
+	PgStat_Counter inserts_since_vacuum;
 	PgStat_Counter changes_since_analyze;
 
 	PgStat_Counter blocks_fetched;
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 00700f2..45a0b07 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1693,6 +1693,7 @@ pg_stat_all_tables| SELECT c.oid AS relid,
     pg_stat_get_tuples_hot_updated(c.oid) AS n_tup_hot_upd,
     pg_stat_get_live_tuples(c.oid) AS n_live_tup,
     pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
+    pg_stat_get_ins_since_vacuum(c.oid) AS n_ins_since_vacuum,
     pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
     pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
     pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
@@ -1826,6 +1827,7 @@ pg_stat_sys_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_tup_hot_upd,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
+    pg_stat_all_tables.n_ins_since_vacuum,
     pg_stat_all_tables.n_mod_since_analyze,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
@@ -1869,6 +1871,7 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_tup_hot_upd,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
+    pg_stat_all_tables.n_ins_since_vacuum,
     pg_stat_all_tables.n_mod_since_analyze,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,

autovac_02_v01.patchinvalid/octet-stream; name=autovac_02_v01.patchDownload

diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 4881f18..7fd7f46 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -373,14 +373,6 @@
     The visibility map is vastly smaller than the heap, so it can easily be
     cached even when the heap is very large.
    </para>
-
-   <para>
-    Note that operations that rewrite the table, such as <command>VACUUM
-    FULL</command>, <command>CLUSTER</command>, and some versions of
-    <command>ALTER TABLE</command> also reset the visibility map.
-    It is recommended to always issue a standard <command>VACUUM</command>
-    after such commands to rebuild it.
-   </para>
   </sect2>
 
   <sect2 id="vacuum-for-wraparound">
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 43bbd90..7663108 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -38,6 +38,7 @@
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "optimizer/planner.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -1096,6 +1097,12 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 					   tups_recently_dead,
 					   pg_rusage_show(&ru0))));
 
+	/* Send statistics */
+	pgstat_report_table_rewrite(OIDOldHeap,
+								OldHeap->rd_rel->relisshared,
+								num_tuples,
+								false);
+
 	/* Clean up */
 	pfree(values);
 	pfree(isnull);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 86e9814..cad7504 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -317,7 +317,7 @@ static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		  AlterTableCmd *cmd, LOCKMODE lockmode);
 static void ATRewriteTables(AlterTableStmt *parsetree,
 				List **wqueue, LOCKMODE lockmode);
-static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
+static double ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
 static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel);
 static void ATSimplePermissions(Relation rel, int allowed_targets);
 static void ATWrongRelkindError(Relation rel, int allowed_targets);
@@ -3770,6 +3770,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			Oid			OIDNewHeap;
 			Oid			NewTableSpace;
 			char		persistence;
+			double		num_tuples;
 
 			OldHeap = heap_open(tab->relid, NoLock);
 
@@ -3855,7 +3856,7 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 			 * modifications, and test the current data within the table
 			 * against new constraints generated by ALTER TABLE commands.
 			 */
-			ATRewriteTable(tab, OIDNewHeap, lockmode);
+			num_tuples = ATRewriteTable(tab, OIDNewHeap, lockmode);
 
 			/*
 			 * Swap the physical files of the old and new heaps, then rebuild
@@ -3871,6 +3872,9 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
 							 RecentXmin,
 							 ReadNextMultiXactId(),
 							 persistence);
+
+			/* Send statistics */
+			pgstat_report_table_rewrite(tab->relid, false, num_tuples, true);
 		}
 		else
 		{
@@ -3942,8 +3946,10 @@ ATRewriteTables(AlterTableStmt *parsetree, List **wqueue, LOCKMODE lockmode)
  * ATRewriteTable: scan or rewrite one table
  *
  * OIDNewHeap is InvalidOid if we don't need to rewrite
+ *
+ * The return value is the number of rows rewritten
  */
-static void
+static double
 ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 {
 	Relation	oldrel;
@@ -3958,6 +3964,7 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 	CommandId	mycid;
 	BulkInsertState bistate;
 	int			hi_options;
+	double		num_tuples = 0;
 
 	/*
 	 * Open the relation(s).  We have surely already locked the existing
@@ -4213,7 +4220,10 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 			/* Write the tuple out to the new relation */
 			if (newrel)
+			{
 				heap_insert(newrel, tuple, mycid, hi_options, bistate);
+				num_tuples += 1;
+			}
 
 			ResetExprContext(econtext);
 
@@ -4241,6 +4251,8 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
 
 		heap_close(newrel, NoLock);
 	}
+
+	return num_tuples;
 }
 
 /*
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 169be3f..72f17c3 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -286,6 +286,7 @@ static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, in
 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
 static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
+static void pgstat_recv_table_rewrite(PgStat_MsgTableRewrite *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
 static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
@@ -1393,6 +1394,33 @@ pgstat_report_analyze(Relation rel,
 }
 
 /* --------
+ * pgstat_report_table_rewrite() -
+ *
+ *  Tell the collector about a table rewrite, due to CLUSTER, VACUUM FULL, or
+ *  ALTER TABLE.
+ *
+ * Caller must provide the number of rows that have been rewritten in hopes of
+ * triggering a VACUUM, and also whether to try triggering an ANALYZE.
+ * --------
+ */
+void
+pgstat_report_table_rewrite(Oid tableoid, bool shared, double rows_rewritten,
+							bool trigger_analyze)
+{
+	PgStat_MsgTableRewrite msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_TABLEREWRITE);
+	msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
+	msg.m_tableoid = tableoid;
+	msg.m_rows_rewritten = rows_rewritten;
+	msg.m_trigger_analyze = trigger_analyze;
+	pgstat_send(&msg, sizeof(msg));
+}
+
+/* --------
  * pgstat_report_recovery_conflict() -
  *
  *	Tell the collector about a Hot Standby recovery conflict.
@@ -3644,6 +3672,10 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_analyze((PgStat_MsgAnalyze *) &msg, len);
 					break;
 
+				case PGSTAT_MTYPE_TABLEREWRITE:
+					pgstat_recv_table_rewrite((PgStat_MsgTableRewrite *) &msg, len);
+					break;
+
 				case PGSTAT_MTYPE_ARCHIVER:
 					pgstat_recv_archiver((PgStat_MsgArchiver *) &msg, len);
 					break;
@@ -5297,6 +5329,32 @@ pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len)
 	}
 }
 
+/* ----------
+ * pgstat_recv_table_rewrite() -
+ *
+ *  Process a table rewrite message.
+ * ----------
+ */
+static void
+pgstat_recv_table_rewrite(PgStat_MsgTableRewrite *msg, int len)
+{
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry *tabentry;
+
+	/*
+	 * Store the data in the table's hashtable entry.
+	 */
+	dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+
+	tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
+
+	tabentry->n_live_tuples = msg->m_rows_rewritten;
+	tabentry->n_dead_tuples = 0;
+	tabentry->inserts_since_vacuum = msg->m_rows_rewritten;
+	if (msg->m_trigger_analyze)
+		tabentry->changes_since_analyze = msg->m_rows_rewritten;
+}
+
 
 /* ----------
  * pgstat_recv_archiver() -
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index d491c71..1731fb4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -58,6 +58,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_AUTOVAC_START,
 	PGSTAT_MTYPE_VACUUM,
 	PGSTAT_MTYPE_ANALYZE,
+	PGSTAT_MTYPE_TABLEREWRITE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
 	PGSTAT_MTYPE_FUNCSTAT,
@@ -391,6 +392,22 @@ typedef struct PgStat_MsgAnalyze
 
 
 /* ----------
+ * PgStat_MsgTableRewrite		Sent by the backend of a table rewrite,
+ * 								such as CLUSTER, VACUUM FULL, or
+ * 								ALTER TABLE
+ * ----------
+ */
+typedef struct PgStat_MsgTableRewrite
+{
+	PgStat_MsgHdr m_hdr;
+	Oid			m_databaseid;
+	Oid			m_tableoid;
+	double		m_rows_rewritten;
+	bool		m_trigger_analyze;
+} PgStat_MsgTableRewrite;
+
+
+/* ----------
  * PgStat_MsgArchiver			Sent by the archiver to update statistics.
  * ----------
  */
@@ -974,6 +991,8 @@ extern void pgstat_report_vacuum(Oid tableoid, bool shared,
 extern void pgstat_report_analyze(Relation rel,
 					  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 					  bool resetcounter);
+extern void pgstat_report_table_rewrite(Oid tableoid, bool shared,
+										double rows_rewritten, bool trigger_analyze);
 
 extern void pgstat_report_recovery_conflict(int reason);
 extern void pgstat_report_deadlock(void);

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Vik Fearing (#1)

Re: VACUUM's ancillary tasks

Hi,

On 2016-08-29 03:26:06 +0200, Vik Fearing wrote:

The attached two patches scratch two itches I've been having for a
while. I'm attaching them together because the second depends on the first.

Both deal with the fact that [auto]vacuum has taken on more roles than
its original purpose.

Patch One: autovacuum insert-heavy tables

If you have a table that mostly receives INSERTs, it will never get
vacuumed because there are no (or few) dead rows. I have added an
"inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
the same way as "changes_since_analyze" does.

The reason such a table needs to be vacuumed is currently twofold: the
visibility map is not updated, slowing down index-only scans; and BRIN
indexes are not maintained, rendering them basically useless.

It might be worthwhile to look at
http://archives.postgresql.org/message-id/CAMkU%3D1zGu5OshfzxKBqDmxxKcoDJu4pJux8UAo5h7k%2BGA_jS3Q%40mail.gmail.com
there's definitely some overlap.

Patch Two: autovacuum after table rewrites

This patch addresses the absurdity that a standard VACUUM is required
after a VACUUM FULL because the visibility map gets blown away. This is
also the case for CLUSTER and some versions of ALTER TABLE that rewrite
the table.

I think this should rather fixed by maintaining the VM during
cluster. IIRC there was an attempt late in the 9.5 cycle, but Bruce
(IIRC) ran out of steam. And nobody picked it up again ... :(

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

In reply to: Andres Freund (#2)

On Monday, 29 August 2016, Andres Freund <andres@anarazel.de
<javascript:_e(%7B%7D,'cvml','andres@anarazel.de');>> wrote:

Hi,

On 2016-08-29 03:26:06 +0200, Vik Fearing wrote:

The attached two patches scratch two itches I've been having for a
while. I'm attaching them together because the second depends on the

first.

Both deal with the fact that [auto]vacuum has taken on more roles than
its original purpose.

Patch One: autovacuum insert-heavy tables

If you have a table that mostly receives INSERTs, it will never get
vacuumed because there are no (or few) dead rows. I have added an
"inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
the same way as "changes_since_analyze" does.

The reason such a table needs to be vacuumed is currently twofold: the
visibility map is not updated, slowing down index-only scans; and BRIN
indexes are not maintained, rendering them basically useless.

It might be worthwhile to look at
http://archives.postgresql.org/message-id/CAMkU%3D1zGu5Oshfz
xKBqDmxxKcoDJu4pJux8UAo5h7k%2BGA_jS3Q%40mail.gmail.com
there's definitely some overlap.

Patch Two: autovacuum after table rewrites

This patch addresses the absurdity that a standard VACUUM is required
after a VACUUM FULL because the visibility map gets blown away. This is
also the case for CLUSTER and some versions of ALTER TABLE that rewrite
the table.

I think this should rather fixed by maintaining the VM during
cluster.

IIRC there was an attempt late in the 9.5 cycle, but Bruce

(IIRC) ran out of steam. And nobody picked it up again ... :(

It may be worth to look at
/messages/by-id/CAMkU=1w3osJJ2FneELhhNRLxfZitDgp9FPHee08NT2FQFmz_pQ@mail.gmail.com

I've updated this patch to apply to current HEAD, can propose it to pg10.

Regards,

--
Masahiko Sawada

--
Regards,

--
Masahiko Sawada

Thomas Munro

thomas.munro@enterprisedb.com

over 9 years ago

In reply to: Vik Fearing (#1)

Re: VACUUM's ancillary tasks

On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:

While doing this, I noticed that ALTER TABLE should also re-analyze the
table for obvious reasons, so I have that one set
"changes_since_analyze" to the number of rows rewritten as well.

I noticed that ATExecAlterColumnType does this:

* Drop any pg_statistic entry for the column, since it's now wrong
type
*/

RemoveStatistics(RelationGetRelid(rel), attnum);

What if there is no rewrite, because the type is binary compatible
(ATColumnChangeRequiresRewrite returns false)? Then I think your patch
won't attract an autovacuum daemon to ANALYZE the table and produce new
statistics, because it won't touch changes_since_analyze. I wonder if we
should simply keep the stats if no rewrite is required. Aren't the
statistical properties of binary-compatible types also compatible?
Alternatively, we should consider bumping changes_since_analyze in this
case too, if you're going to do it in the rewrite case.

--
Thomas Munro
http://www.enterprisedb.com

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Thomas Munro (#4)

Re: VACUUM's ancillary tasks

Thomas Munro <thomas.munro@enterprisedb.com> writes:

I noticed that ATExecAlterColumnType does this:
* Drop any pg_statistic entry for the column, since it's now wrong type

What if there is no rewrite, because the type is binary compatible
(ATColumnChangeRequiresRewrite returns false)? Then I think your patch
won't attract an autovacuum daemon to ANALYZE the table and produce new
statistics, because it won't touch changes_since_analyze. I wonder if we
should simply keep the stats if no rewrite is required. Aren't the
statistical properties of binary-compatible types also compatible?

Not necessarily: the type semantics might be different --- in fact,
probably are different, else why would there be distinct types in the
first place? It would be unwise to keep the old stats IMO.

If you need a concrete example, consider OID vs int4. They're
binary-compatible, but since int4 is signed while OID is unsigned,
stats for one would be wrong for the other. This is the same reason
why ALTER COLUMN TYPE has to rebuild indexes even in binary-compatible
cases.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

over 9 years ago

In reply to: Tom Lane (#5)

Re: VACUUM's ancillary tasks

On Tue, Sep 27, 2016 at 2:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Thomas Munro <thomas.munro@enterprisedb.com> writes:

I noticed that ATExecAlterColumnType does this:
* Drop any pg_statistic entry for the column, since it's now wrong type

What if there is no rewrite, because the type is binary compatible
(ATColumnChangeRequiresRewrite returns false)? Then I think your patch
won't attract an autovacuum daemon to ANALYZE the table and produce new
statistics, because it won't touch changes_since_analyze. I wonder if we
should simply keep the stats if no rewrite is required. Aren't the
statistical properties of binary-compatible types also compatible?

Not necessarily: the type semantics might be different --- in fact,
probably are different, else why would there be distinct types in the
first place? It would be unwise to keep the old stats IMO.

If you need a concrete example, consider OID vs int4. They're
binary-compatible, but since int4 is signed while OID is unsigned,
stats for one would be wrong for the other. This is the same reason
why ALTER COLUMN TYPE has to rebuild indexes even in binary-compatible
cases.

Ah, right. Then I think this patch should somehow bump
changes_since_analyze in the no-rewrite case if it's going to do it in
the rewrite case. It would be surprising and weird if altering a
column's type *sometimes* resulted in new statistics being
automatically generated to replace those that were dropped, depending
on the technical detail of whether a rewrite was necessary.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

over 9 years ago

In reply to: Vik Fearing (#1)

Re: VACUUM's ancillary tasks

On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:

The attached two patches scratch two itches I've been having for a
while. I'm attaching them together because the second depends on the first.

Both deal with the fact that [auto]vacuum has taken on more roles than
its original purpose.

Patch One: autovacuum insert-heavy tables

If you have a table that mostly receives INSERTs, it will never get
vacuumed because there are no (or few) dead rows. I have added an
"inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
the same way as "changes_since_analyze" does.

The reason such a table needs to be vacuumed is currently twofold: the
visibility map is not updated, slowing down index-only scans; and BRIN
indexes are not maintained, rendering them basically useless.

I'm aware of those two problems, but not very familiar with the
details. I don't feel qualified to say whether insert counting is the
best approach to the problem at this point. I looked into it a little
bit however, and had the following thoughts:

About BRIN indexes: I couldn't find an explanation of why BRIN
indexes don't automatically create new summary tuples when you insert
a new tuple in an unsummarised page range. Is it deferred until
VACUUM time in order to untangle some otherwise unresolvable
interlocking or crash safety problem, or could that one day be done?
Assuming that it must be deferred for some technical reason and there
is no way around it, then I wonder if there is a more direct and
accurate way to figure out when it's necessary than counting inserts.
Counting inserts seems slightly bogus because you can't tell whether
those were inserts into an existing summarised block which is
self-maintaining or not. At first glance it looks a bit like
unsummarised ranges can only appear at the end of the table, is that
right? If so, couldn't you detect the number of unsummarised BRIN
blocks just by comparing the highest summarised BRIN block and the
current heap size?

About visibility maps: How crazy would it be to estimate the number
of not-all-visible pages instead? It would be less work to count that
since it would only increase when the *first* tuple is inserted into a
page that is currently all visible (ie when the bit is cleared), not
for every tuple inserted into any page like your inserts_since_vacuum
counter. Another difference is that inserts_since_vacuum is reset
even if vacuum finds that it *can't* set the all-visible bit for a
given page yet because of some concurrent transaction. In that case
the bit is still not set but autovacuum has no reason to be triggered
again.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Vik Fearing

vik@2ndquadrant.fr

over 9 years ago

In reply to: Thomas Munro (#7)

Re: VACUUM's ancillary tasks

On 10/01/2016 09:28 AM, Thomas Munro wrote:

On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:

The attached two patches scratch two itches I've been having for a
while. I'm attaching them together because the second depends on the first.

Both deal with the fact that [auto]vacuum has taken on more roles than
its original purpose.

Patch One: autovacuum insert-heavy tables

If you have a table that mostly receives INSERTs, it will never get
vacuumed because there are no (or few) dead rows. I have added an
"inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
the same way as "changes_since_analyze" does.

The reason such a table needs to be vacuumed is currently twofold: the
visibility map is not updated, slowing down index-only scans; and BRIN
indexes are not maintained, rendering them basically useless.

I'm aware of those two problems, but not very familiar with the
details. I don't feel qualified to say whether insert counting is the
best approach to the problem at this point. I looked into it a little
bit however, and had the following thoughts:

About BRIN indexes: I couldn't find an explanation of why BRIN
indexes don't automatically create new summary tuples when you insert
a new tuple in an unsummarised page range. Is it deferred until
VACUUM time in order to untangle some otherwise unresolvable
interlocking or crash safety problem, or could that one day be done?
Assuming that it must be deferred for some technical reason and there
is no way around it, then I wonder if there is a more direct and
accurate way to figure out when it's necessary than counting inserts.
Counting inserts seems slightly bogus because you can't tell whether
those were inserts into an existing summarised block which is
self-maintaining or not. At first glance it looks a bit like
unsummarised ranges can only appear at the end of the table, is that
right? If so, couldn't you detect the number of unsummarised BRIN
blocks just by comparing the highest summarised BRIN block and the
current heap size?

About visibility maps: How crazy would it be to estimate the number
of not-all-visible pages instead? It would be less work to count that
since it would only increase when the *first* tuple is inserted into a
page that is currently all visible (ie when the bit is cleared), not
for every tuple inserted into any page like your inserts_since_vacuum
counter. Another difference is that inserts_since_vacuum is reset
even if vacuum finds that it *can't* set the all-visible bit for a
given page yet because of some concurrent transaction. In that case
the bit is still not set but autovacuum has no reason to be triggered
again.

Sure, I could handle each case separately, but the goal of this patch
(as hinted at in the Subject) is to generalize all the different tasks
we've been giving to VACUUM. The only missing piece is what the first
patch addresses; which is insert-mostly tables never getting vacuumed.

As for the second patch, I would like to withdraw it and redesign it,
based on your comments. The redesign I have in mind will no longer be
dependent on the first patch, so I'll submit it separately.
--
Vik Fearing +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Thomas Munro

thomas.munro@enterprisedb.com

over 9 years ago

In reply to: Vik Fearing (#8)

Re: VACUUM's ancillary tasks

On Sun, Oct 2, 2016 at 9:34 AM, Vik Fearing <vik@2ndquadrant.fr> wrote:

On 10/01/2016 09:28 AM, Thomas Munro wrote:

On Mon, Aug 29, 2016 at 1:26 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:

The attached two patches scratch two itches I've been having for a
while. I'm attaching them together because the second depends on the first.

Both deal with the fact that [auto]vacuum has taken on more roles than
its original purpose.

Patch One: autovacuum insert-heavy tables

If you have a table that mostly receives INSERTs, it will never get
vacuumed because there are no (or few) dead rows. I have added an
"inserts_since_vacuum" field to PgStat_StatTabEntry which works exactly
the same way as "changes_since_analyze" does.

The reason such a table needs to be vacuumed is currently twofold: the
visibility map is not updated, slowing down index-only scans; and BRIN
indexes are not maintained, rendering them basically useless.

I'm aware of those two problems, but not very familiar with the
details. I don't feel qualified to say whether insert counting is the
best approach to the problem at this point. I looked into it a little
bit however, and had the following thoughts:

About BRIN indexes: I couldn't find an explanation of why BRIN
indexes don't automatically create new summary tuples when you insert
a new tuple in an unsummarised page range. Is it deferred until
VACUUM time in order to untangle some otherwise unresolvable
interlocking or crash safety problem, or could that one day be done?
Assuming that it must be deferred for some technical reason and there
is no way around it, then I wonder if there is a more direct and
accurate way to figure out when it's necessary than counting inserts.
Counting inserts seems slightly bogus because you can't tell whether
those were inserts into an existing summarised block which is
self-maintaining or not. At first glance it looks a bit like
unsummarised ranges can only appear at the end of the table, is that
right? If so, couldn't you detect the number of unsummarised BRIN
blocks just by comparing the highest summarised BRIN block and the
current heap size?

About visibility maps: How crazy would it be to estimate the number
of not-all-visible pages instead? It would be less work to count that
since it would only increase when the *first* tuple is inserted into a
page that is currently all visible (ie when the bit is cleared), not
for every tuple inserted into any page like your inserts_since_vacuum
counter. Another difference is that inserts_since_vacuum is reset
even if vacuum finds that it *can't* set the all-visible bit for a
given page yet because of some concurrent transaction. In that case
the bit is still not set but autovacuum has no reason to be triggered
again.

Sure, I could handle each case separately, but the goal of this patch
(as hinted at in the Subject) is to generalize all the different tasks
we've been giving to VACUUM. The only missing piece is what the first
patch addresses; which is insert-mostly tables never getting vacuumed.

Yeah, that makes sense. I just wanted to discuss what the ideal
launch conditions would be for those particular ancillary jobs, and
then figure out whether the difference matters. Generally, I think
changes to autovacuum heuristics need some consensus-building
discussion, especially in light of other related ideas from Jeff
Janes, and from people involved with BRIN and visibility map design,
including Simon who signed up as a reviewer. Since we're out of time
I'm going to move this to the November CF, and let's hear from them.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Vik Fearing (#8)

Re: VACUUM's ancillary tasks

On Sat, Oct 1, 2016 at 1:34 PM, Vik Fearing <vik@2ndquadrant.fr> wrote:

Sure, I could handle each case separately, but the goal of this patch
(as hinted at in the Subject) is to generalize all the different tasks
we've been giving to VACUUM. The only missing piece is what the first
patch addresses; which is insert-mostly tables never getting vacuumed.

I don't buy the argument that we should do this in order to be "general".
Those other pieces are present to achieve a specific job, not out of
generality.

If we want to have something to vacuum insert-mostly tables for the sake of
the index-only-scans, then I don't think we can ignore the fact that the
visibility map is page based, not tuple based. If we insert 10,000 tuples
into a full table and they all go into 100 pages at the end, that is very
different than inserting 10,000 tuples which each go into a separate page
(because each page has that amount of freespace). In one case you have
10,000 tuples not marked as all visible, in the other case you have
1,000,000 not marked as all visible.

Also, I think that doing more counts which get amalgamated into the same
threshold, rather than introducing another parameter, is a bad thing. I
have insert-mostly, most of the time, tables which are never going to
benefit from index-only-scans, and I don't want to pay the cost of them
getting vacuumed just because of some inserts, when I will never get a
benefit out of it. I could turn autovacuum off for those tables, but then
I would have to remember to manually intervene on the rare occasion they do
get updates or deletes. I want to be able to manually pick which tables I
sign up for this feature (and then forget it), not manually pick which ones
to exempt from it and have to constantly review that.

Cheers,

Jeff

#11

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Jeff Janes (#10)

Re: VACUUM's ancillary tasks

On Thu, Oct 6, 2016 at 3:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Sure, I could handle each case separately, but the goal of this patch
(as hinted at in the Subject) is to generalize all the different tasks
we've been giving to VACUUM. The only missing piece is what the first
patch addresses; which is insert-mostly tables never getting vacuumed.

I don't buy the argument that we should do this in order to be "general".
Those other pieces are present to achieve a specific job, not out of
generality.

+1.

If we want to have something to vacuum insert-mostly tables for the sake of
the index-only-scans, then I don't think we can ignore the fact that the
visibility map is page based, not tuple based. If we insert 10,000 tuples
into a full table and they all go into 100 pages at the end, that is very
different than inserting 10,000 tuples which each go into a separate page
(because each page has that amount of freespace). In one case you have
10,000 tuples not marked as all visible, in the other case you have
1,000,000 not marked as all visible.

+1.

Also, I think that doing more counts which get amalgamated into the same
threshold, rather than introducing another parameter, is a bad thing. I
have insert-mostly, most of the time, tables which are never going to
benefit from index-only-scans, and I don't want to pay the cost of them
getting vacuumed just because of some inserts, when I will never get a
benefit out of it. I could turn autovacuum off for those tables, but then I
would have to remember to manually intervene on the rare occasion they do
get updates or deletes. I want to be able to manually pick which tables I
sign up for this feature (and then forget it), not manually pick which ones
to exempt from it and have to constantly review that.

Of course, if you do that, then what will happen is eventually it will
be time to advance relfrozenxid for that relation, and you'll get a
giant soul-crushing VACUUM of doom, rather than a bunch of small,
hopefully-innocuous VACUUMs. I've been wondering if would make sense
to trigger vacuums based on inserts based on a *fixed* number of
pages, rather than a percentage of the table. Say, for example,
whenever we have 64MB of not-all-visible pages, we VACUUM.

There are some difficulties:

1. We don't want to vacuum too early. For example, a bulk load may
vastly inflate the size of a table, but vacuuming it while the load is
in progress will be useless. You can maybe avoid that problem by
basing this on statistics only reported at the end of the transaction,
but there's another, closely-related issue: vacuuming right after the
transaction completes may be useless, too, if there are old snapshots
still in existence that render the newly-inserted tuples
not-all-visible.

2. We don't want to trigger index vacuuming for a handful of dead
tuples, or at least not too often. I've previously suggested
requiring a certain minimum number of dead index tuples that would be
required before index vacuuming occurs; prior to that, we'd just
truncate to dead line pointers and leave those for the next vacuum
cycle. This might need some refinement - should it be page-based? -
but the basic idea still seems sound.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Jeff Janes

jeff.janes@gmail.com

over 9 years ago

In reply to: Robert Haas (#11)

Re: VACUUM's ancillary tasks

On Thu, Oct 6, 2016 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Also, I think that doing more counts which get amalgamated into the same
threshold, rather than introducing another parameter, is a bad thing. I
have insert-mostly, most of the time, tables which are never going to
benefit from index-only-scans, and I don't want to pay the cost of them
getting vacuumed just because of some iOn Thu, Oct 6, 2016 at 3:56 PM,

Jeff Janes <jeff.janes@gmail.com> wrote:

Sure, I could handle each case separately, but the goal of this patch

nserts, when I will never get a

benefit out of it. I could turn autovacuum off for those tables, but

then I

would have to remember to manually intervene on the rare occasion they do
get updates or deletes. I want to be able to manually pick which tables

I

sign up for this feature (and then forget it), not manually pick which

ones

to exempt from it and have to constantly review that.

Of course, if you do that, then what will happen is eventually it will
be time to advance relfrozenxid for that relation, and you'll get a
giant soul-crushing VACUUM of doom, rather than a bunch of small,
hopefully-innocuous VACUUMs.

I think I will get the soul-crushing vacuum of doom anyway, if the database
lasts that long. For one reason, in my case while updates and deletes are
rare, they are common enough that the frozen vm bits from early vacuums
will be mostly cleared before the vacuum of doom comes around.

For a second reason, I don't think the frozen bit in the vm actually gets
set by much of anything other than a autovacuum-for-wraparound or a manual
VACUUM FREEZE.

In commit 37484ad2aacef5ec7, you changed the way that frozen tuples were
represented, so that we could make freezing more aggressive without losing
forensic evidence. But I don't think we ever did anything to actually make
the freezing more aggressive.

When I applied the up-thread patch so that pgbench_history gets autovac,
those autovacs don't actually cause any pages to get frozen until the wrap
around kicks in, even when all the tuples on the early pages should be well
beyond vacuum_freeze_min_age.

Cheers,

Jeff

#13

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

In reply to: Jeff Janes (#12)

Re: VACUUM's ancillary tasks

On Fri, Oct 7, 2016 at 9:40 AM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Thu, Oct 6, 2016 at 2:46 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Also, I think that doing more counts which get amalgamated into the same
threshold, rather than introducing another parameter, is a bad thing. I
have insert-mostly, most of the time, tables which are never going to
benefit from index-only-scans, and I don't want to pay the cost of them
getting vacuumed just because of some iOn Thu, Oct 6, 2016 at 3:56 PM,
Jeff Janes <jeff.janes@gmail.com> wrote:

Sure, I could handle each case separately, but the goal of this patch

nserts, when I will never get a

benefit out of it. I could turn autovacuum off for those tables, but
then I
would have to remember to manually intervene on the rare occasion they
do
get updates or deletes. I want to be able to manually pick which tables
I
sign up for this feature (and then forget it), not manually pick which
ones
to exempt from it and have to constantly review that.

Of course, if you do that, then what will happen is eventually it will
be time to advance relfrozenxid for that relation, and you'll get a
giant soul-crushing VACUUM of doom, rather than a bunch of small,
hopefully-innocuous VACUUMs.

I think I will get the soul-crushing vacuum of doom anyway, if the database
lasts that long. For one reason, in my case while updates and deletes are
rare, they are common enough that the frozen vm bits from early vacuums will
be mostly cleared before the vacuum of doom comes around.

For a second reason, I don't think the frozen bit in the vm actually gets
set by much of anything other than a autovacuum-for-wraparound or a manual
VACUUM FREEZE.

Yeah, the freeze map would be effective especially for static table.
Since we can skip to vacuum frozen page and the freezing tuples is not
big pain usually, we might want to change autovacuum more aggressive
to freeze the page by reducing default value of vacuum_freeze_min_age.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

In reply to: Robert Haas (#11)

Re: VACUUM's ancillary tasks

On Fri, Oct 7, 2016 at 6:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 6, 2016 at 3:56 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

Sure, I could handle each case separately, but the goal of this patch
(as hinted at in the Subject) is to generalize all the different tasks
we've been giving to VACUUM. The only missing piece is what the first
patch addresses; which is insert-mostly tables never getting vacuumed.

I don't buy the argument that we should do this in order to be "general".
Those other pieces are present to achieve a specific job, not out of
generality.

+1.

If we want to have something to vacuum insert-mostly tables for the sake of
the index-only-scans, then I don't think we can ignore the fact that the
visibility map is page based, not tuple based. If we insert 10,000 tuples
into a full table and they all go into 100 pages at the end, that is very
different than inserting 10,000 tuples which each go into a separate page
(because each page has that amount of freespace). In one case you have
10,000 tuples not marked as all visible, in the other case you have
1,000,000 not marked as all visible.

+1.

Also, I think that doing more counts which get amalgamated into the same
threshold, rather than introducing another parameter, is a bad thing. I
have insert-mostly, most of the time, tables which are never going to
benefit from index-only-scans, and I don't want to pay the cost of them
getting vacuumed just because of some inserts, when I will never get a
benefit out of it. I could turn autovacuum off for those tables, but then I
would have to remember to manually intervene on the rare occasion they do
get updates or deletes. I want to be able to manually pick which tables I
sign up for this feature (and then forget it), not manually pick which ones
to exempt from it and have to constantly review that.

Of course, if you do that, then what will happen is eventually it will
be time to advance relfrozenxid for that relation, and you'll get a
giant soul-crushing VACUUM of doom, rather than a bunch of small,
hopefully-innocuous VACUUMs. I've been wondering if would make sense
to trigger vacuums based on inserts based on a *fixed* number of
pages, rather than a percentage of the table. Say, for example,
whenever we have 64MB of not-all-visible pages, we VACUUM.

There are some difficulties:

1. We don't want to vacuum too early. For example, a bulk load may
vastly inflate the size of a table, but vacuuming it while the load is
in progress will be useless. You can maybe avoid that problem by
basing this on statistics only reported at the end of the transaction,
but there's another, closely-related issue: vacuuming right after the
transaction completes may be useless, too, if there are old snapshots
still in existence that render the newly-inserted tuples
not-all-visible.

If the dummy xid can be generated for vacuum before starting vacuum
for maintenance vm which is triggered by the amount of the cleared vm
page, that vacuum could wait for old transaction finishes.

2. We don't want to trigger index vacuuming for a handful of dead
tuples, or at least not too often. I've previously suggested
requiring a certain minimum number of dead index tuples that would be
required before index vacuuming occurs; prior to that, we'd just
truncate to dead line pointers and leave those for the next vacuum
cycle. This might need some refinement - should it be page-based? -
but the basic idea still seems sound.

Where do we leave dead line pointers at?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Jeff Janes (#12)

Re: VACUUM's ancillary tasks

On Thu, Oct 6, 2016 at 8:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

In commit 37484ad2aacef5ec7, you changed the way that frozen tuples were
represented, so that we could make freezing more aggressive without losing
forensic evidence. But I don't think we ever did anything to actually make
the freezing more aggressive.

See 3cff1879f8d03cb729368722ca823a4bf74c0cac. The objection to doing
it in other cases is that it adds write-ahead log volume which might
cause its own share of problems. There might be some way of getting
ahead of that, though. I think if we piggyback on an existing WAL
record like XLOG_HEAP2_CLEAN or XLOG_HEAP2_VISIBLE the impact might be
minimal, but I haven't been dedicated enough to try to write the
patch.

When I applied the up-thread patch so that pgbench_history gets autovac,
those autovacs don't actually cause any pages to get frozen until the wrap
around kicks in, even when all the tuples on the early pages should be well
beyond vacuum_freeze_min_age.

If the pages are already all-visible, they'll be skipped until
vacuum_freeze_table_age is reached.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Jeff Janes

jeff.janes@gmail.com

about 9 years ago

In reply to: Robert Haas (#15)

Re: VACUUM's ancillary tasks

On Fri, Oct 7, 2016 at 6:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 6, 2016 at 8:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

In commit 37484ad2aacef5ec7, you changed the way that frozen tuples were
represented, so that we could make freezing more aggressive without

losing

forensic evidence. But I don't think we ever did anything to actually

make

the freezing more aggressive.

See 3cff1879f8d03cb729368722ca823a4bf74c0cac. The objection to doing
it in other cases is that it adds write-ahead log volume which might
cause its own share of problems. There might be some way of getting
ahead of that, though. I think if we piggyback on an existing WAL
record like XLOG_HEAP2_CLEAN or XLOG_HEAP2_VISIBLE the impact might be
minimal, but I haven't been dedicated enough to try to write the
patch.

When I applied the up-thread patch so that pgbench_history gets autovac,
those autovacs don't actually cause any pages to get frozen until the

wrap

around kicks in, even when all the tuples on the early pages should be

well

beyond vacuum_freeze_min_age.

If the pages are already all-visible, they'll be skipped until
vacuum_freeze_table_age is reached.

So if I set vacuum_freeze_min_age to zero, then they should become
all-visible and all-frozen at the same time, and avoid that problem?

From the docs on vacuum_freeze_min_age: "Increasing this setting may avoid
unnecessary work if the rows that would otherwise be frozen will soon be
modified again". How much work is that? Presumably they are already
getting marked visible, is marking them frozen too a meaningful amount of
extra work? Is it just the extra WAL record?

Cheers,

Jeff

#17

Robert Haas

robertmhaas@gmail.com

about 9 years ago

In reply to: Jeff Janes (#16)

Re: VACUUM's ancillary tasks

On Sun, Oct 16, 2016 at 3:35 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Oct 7, 2016 at 6:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 6, 2016 at 8:40 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

In commit 37484ad2aacef5ec7, you changed the way that frozen tuples were
represented, so that we could make freezing more aggressive without
losing
forensic evidence. But I don't think we ever did anything to actually
make
the freezing more aggressive.

See 3cff1879f8d03cb729368722ca823a4bf74c0cac. The objection to doing
it in other cases is that it adds write-ahead log volume which might
cause its own share of problems. There might be some way of getting
ahead of that, though. I think if we piggyback on an existing WAL
record like XLOG_HEAP2_CLEAN or XLOG_HEAP2_VISIBLE the impact might be
minimal, but I haven't been dedicated enough to try to write the
patch.

When I applied the up-thread patch so that pgbench_history gets autovac,
those autovacs don't actually cause any pages to get frozen until the
wrap
around kicks in, even when all the tuples on the early pages should be
well
beyond vacuum_freeze_min_age.

If the pages are already all-visible, they'll be skipped until
vacuum_freeze_table_age is reached.

So if I set vacuum_freeze_min_age to zero, then they should become
all-visible and all-frozen at the same time, and avoid that problem?

Hmm. I *think* so...

From the docs on vacuum_freeze_min_age: "Increasing this setting may avoid
unnecessary work if the rows that would otherwise be frozen will soon be
modified again". How much work is that? Presumably they are already getting
marked visible, is marking them frozen too a meaningful amount of extra
work? Is it just the extra WAL record?

Yeah, the extra WAL record is the main thing, I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#18

Haribabu Kommi

kommi.haribabu@gmail.com

about 9 years ago

In reply to: Robert Haas (#17)

Re: VACUUM's ancillary tasks

On Wed, Oct 19, 2016 at 2:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Sun, Oct 16, 2016 at 3:35 PM, Jeff Janes <jeff.janes@gmail.com> wrote:

On Fri, Oct 7, 2016 at 6:14 AM, Robert Haas <robertmhaas@gmail.com>

wrote:

On Thu, Oct 6, 2016 at 8:40 PM, Jeff Janes <jeff.janes@gmail.com>

wrote:

In commit 37484ad2aacef5ec7, you changed the way that frozen tuples

were

represented, so that we could make freezing more aggressive without
losing
forensic evidence. But I don't think we ever did anything to actually
make
the freezing more aggressive.

See 3cff1879f8d03cb729368722ca823a4bf74c0cac. The objection to doing
it in other cases is that it adds write-ahead log volume which might
cause its own share of problems. There might be some way of getting
ahead of that, though. I think if we piggyback on an existing WAL
record like XLOG_HEAP2_CLEAN or XLOG_HEAP2_VISIBLE the impact might be
minimal, but I haven't been dedicated enough to try to write the
patch.

When I applied the up-thread patch so that pgbench_history gets

autovac,

those autovacs don't actually cause any pages to get frozen until the
wrap
around kicks in, even when all the tuples on the early pages should be
well
beyond vacuum_freeze_min_age.

If the pages are already all-visible, they'll be skipped until
vacuum_freeze_table_age is reached.

So if I set vacuum_freeze_min_age to zero, then they should become
all-visible and all-frozen at the same time, and avoid that problem?

Hmm. I *think* so...

From the docs on vacuum_freeze_min_age: "Increasing this setting may

avoid

unnecessary work if the rows that would otherwise be frozen will soon be
modified again". How much work is that? Presumably they are already

getting

marked visible, is marking them frozen too a meaningful amount of extra
work? Is it just the extra WAL record?

Yeah, the extra WAL record is the main thing, I think.

Closed in 2016-11 commitfest with "returned with feedback" status as per my
understanding from the recent mails on the thread.

Please feel free to change the status if the current status is doesn't
reflect the
actual status.

Regards,
Hari Babu
Fujitsu Australia

#19

Alvaro Herrera

alvherre@2ndquadrant.com

almost 9 years ago

In reply to: Thomas Munro (#7)

Re: VACUUM's ancillary tasks

Thomas Munro wrote:

About BRIN indexes: I couldn't find an explanation of why BRIN
indexes don't automatically create new summary tuples when you insert
a new tuple in an unsummarised page range. Is it deferred until
VACUUM time in order to untangle some otherwise unresolvable
interlocking or crash safety problem, or could that one day be done?

The reason is performance for the bulk insert case, which we don't want
to slow down; the range summarization is done at a later time by a
background process so that the inserting process is not slowed down by
having to repeatedly re-compute the summary tuple for each heap
insertion. I think the ideal mechanism would be that a summarization is
signalled somehow (to another process) as soon as an insertion occupies
a block just past the previous unsummarized range. (If there are many
readers, perhaps it's better to summarize when the range is half full or
something like that.)

We could have a reloption that switches from this behavior to the other
obvious possibility which is to insert a new summary tuple upon the
first heap insertion to an unsummarized range, but ISTM that that
behavior is pessimal.

Counting inserts seems slightly bogus because you can't tell whether
those were inserts into an existing summarised block which is
self-maintaining or not. At first glance it looks a bit like
unsummarised ranges can only appear at the end of the table, is that
right? If so, couldn't you detect the number of unsummarised BRIN
blocks just by comparing the highest summarised BRIN block and the
current heap size?

We don't have mechanism to invalidate the summary of a range thus far,
so yeah we could try to detect it directly as you suggest.

I would like to be able to invalidate these tuples though, for the case
where many tuples are removed from a range (a fresh summarization could
produce tighter limits). I think this problem does not necessarily
invalidate the idea above.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers