forcing a rebuild of the visibility map

Started by Robert Haasover 9 years ago12 messages
#1Robert Haas
robertmhaas@gmail.com
2 attachment(s)

[ Changing subject line in the hopes of keeping separate issues in
separate threads. ]

On Mon, Jun 6, 2016 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm intuitively sympathetic to the idea that we should have an option
for this, but I can't figure out in what case we'd actually tell
anyone to use it. It would be useful for the kinds of bugs listed
above to have VACUUM (rebuild_vm) to blow away the VM fork and rebuild
it, but that's different semantics than what we proposed for VACUUM
(even_frozen_pages). And I'd be sort of inclined to handle that case
by providing some other way to remove VM forks (like a new function in
the pg_visibilitymap contrib module, maybe?) and then just tell people
to run regular VACUUM afterwards, rather than putting the actual VM
fork removal into VACUUM.

There's a lot to be said for that approach. If we do it, I'd be a bit
inclined to offer an option to blow away the FSM as well.

After having been scared out of my mind by the discovery of
longstanding breakage in heap_update[1]/messages/by-id/CA+TgmoZ-1TXCsxWcp3i-KMo5DNB-6p-odqw7c-N_q3pzZY1TJg@mail.gmail.com, it occurred to me that this
is an excellent example of a case in which the option for which Andres
was agitating - specifically forcing VACUUM to visit ever single page
in the heap - would be useful. Even if there's functionality
available to blow away the visibility map forks, you might not want to
just go remove them all and re-vacuum everything, because while you
were rebuilding the VM forks, index-only scans would stop being
index-only, and that might cause an operational problem. Being able
to simply force every page to be visited is better. Patch for that
attached. I decided to call the option disable_page_skipping rather
than even_frozen_pages because it should also force visiting pages
that are all-visible but not all-frozen. (I was sorely tempted to go
with the competing proposal of calling it VACUUM (THEWHOLEDAMNTHING)
but I refrained.)

However, I also think that the approach described above - providing a
way to excise VM forks specifically - is useful. Patch for that
attached, too. It turns out that can't be done without either adding
a new WAL record type or extending an existing one; I chose to add a
"flags" argument to XLOG_SMGR_TRUNCATE, specifying the forks to be
truncated. Since this will require bumping the XLOG version, if we're
going to do it, and I think we should, it would be good to try to get
it done before beta2 rather than closer to release. However, I don't
want to commit this without some review, so please review this.
Actually, please review both patches.[2]This request is directed at the list generally, not at any specific individual ... well, OK, I mean you. Yeah, you! The same core support could
be used to add an option to truncate the FSM, but I didn't write a
patch for that because I'm incredibly { busy | lazy }.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]: /messages/by-id/CA+TgmoZ-1TXCsxWcp3i-KMo5DNB-6p-odqw7c-N_q3pzZY1TJg@mail.gmail.com
[2]: This request is directed at the list generally, not at any specific individual ... well, OK, I mean you. Yeah, you!
specific individual ... well, OK, I mean you. Yeah, you!

Attachments:

truncate-vm-v1.patchtext/x-diff; charset=US-ASCII; name=truncate-vm-v1.patchDownload
From e2b6404c9a9ecb22eec41d010ce786cc27d73fc7 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jun 2016 11:33:27 -0400
Subject: [PATCH] pg_visibility: Add pg_truncate_visibility_map function.

This requires some core changes as well so that we can properly
WAL-log the truncation.
---
 contrib/pg_visibility/pg_visibility--1.0--1.1.sql |  6 +++
 contrib/pg_visibility/pg_visibility--1.1.sql      |  8 ++++
 contrib/pg_visibility/pg_visibility.c             | 53 +++++++++++++++++++++++
 doc/src/sgml/pgvisibility.sgml                    | 30 ++++++++++---
 src/backend/access/rmgrdesc/smgrdesc.c            |  3 +-
 src/backend/catalog/storage.c                     | 16 ++++---
 src/include/catalog/storage_xlog.h                |  8 ++++
 7 files changed, 112 insertions(+), 12 deletions(-)

diff --git a/contrib/pg_visibility/pg_visibility--1.0--1.1.sql b/contrib/pg_visibility/pg_visibility--1.0--1.1.sql
index 2c97dfd..9336b48 100644
--- a/contrib/pg_visibility/pg_visibility--1.0--1.1.sql
+++ b/contrib/pg_visibility/pg_visibility--1.0--1.1.sql
@@ -13,5 +13,11 @@ RETURNS SETOF tid
 AS 'MODULE_PATHNAME', 'pg_check_visible'
 LANGUAGE C STRICT;
 
+CREATE FUNCTION pg_truncate_visibility_map(regclass)
+RETURNS void
+AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
+LANGUAGE C STRICT
+PARALLEL UNSAFE;  -- let's not make this any more dangerous
+
 REVOKE ALL ON FUNCTION pg_check_frozen(regclass) FROM PUBLIC;
 REVOKE ALL ON FUNCTION pg_check_visible(regclass) FROM PUBLIC;
diff --git a/contrib/pg_visibility/pg_visibility--1.1.sql b/contrib/pg_visibility/pg_visibility--1.1.sql
index b49b644..0a29967 100644
--- a/contrib/pg_visibility/pg_visibility--1.1.sql
+++ b/contrib/pg_visibility/pg_visibility--1.1.sql
@@ -57,6 +57,13 @@ RETURNS SETOF tid
 AS 'MODULE_PATHNAME', 'pg_check_visible'
 LANGUAGE C STRICT;
 
+-- Truncate the visibility map fork.
+CREATE FUNCTION pg_truncate_visibility_map(regclass)
+RETURNS void
+AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
+LANGUAGE C STRICT
+PARALLEL UNSAFE;  -- let's not make this any more dangerous
+
 -- Don't want these to be available to public.
 REVOKE ALL ON FUNCTION pg_visibility_map(regclass, bigint) FROM PUBLIC;
 REVOKE ALL ON FUNCTION pg_visibility(regclass, bigint) FROM PUBLIC;
@@ -65,3 +72,4 @@ REVOKE ALL ON FUNCTION pg_visibility(regclass) FROM PUBLIC;
 REVOKE ALL ON FUNCTION pg_visibility_map_summary(regclass) FROM PUBLIC;
 REVOKE ALL ON FUNCTION pg_check_frozen(regclass) FROM PUBLIC;
 REVOKE ALL ON FUNCTION pg_check_visible(regclass) FROM PUBLIC;
+REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM PUBLIC;
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index abb92f3..eb3b0ca 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -11,10 +11,12 @@
 #include "access/htup_details.h"
 #include "access/visibilitymap.h"
 #include "catalog/pg_type.h"
+#include "catalog/storage_xlog.h"
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
 #include "storage/procarray.h"
+#include "storage/smgr.h"
 #include "utils/rel.h"
 
 PG_MODULE_MAGIC;
@@ -40,6 +42,7 @@ PG_FUNCTION_INFO_V1(pg_visibility_rel);
 PG_FUNCTION_INFO_V1(pg_visibility_map_summary);
 PG_FUNCTION_INFO_V1(pg_check_frozen);
 PG_FUNCTION_INFO_V1(pg_check_visible);
+PG_FUNCTION_INFO_V1(pg_truncate_visibility_map);
 
 static TupleDesc pg_visibility_tupdesc(bool include_blkno, bool include_pd);
 static vbits *collect_visibility_data(Oid relid, bool include_pd);
@@ -336,6 +339,56 @@ pg_check_visible(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Remove the visibility map fork for a relation.  If there turn out to be
+ * any bugs in the visibility map code that require rebuilding the VM, this
+ * provides users with a way to do it that is cleaner than shutting down the
+ * server and removing files by hand.
+ *
+ * This is a cut-down version of RelationTruncate.
+ */
+Datum
+pg_truncate_visibility_map(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	Relation	rel;
+
+	rel = relation_open(relid, AccessExclusiveLock);
+
+	if (rel->rd_rel->relkind != RELKIND_RELATION &&
+		rel->rd_rel->relkind != RELKIND_MATVIEW &&
+		rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+		   errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+				  RelationGetRelationName(rel))));
+
+	RelationOpenSmgr(rel);
+	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
+
+	visibilitymap_truncate(rel, 0);
+
+	if (RelationNeedsWAL(rel))
+	{
+		xl_smgr_truncate	xlrec;
+
+		xlrec.blkno = 0;
+		xlrec.rnode = rel->rd_node;
+		xlrec.flags = SMGR_TRUNCATE_VM;
+
+		XLogBeginInsert();
+		XLogRegisterData((char *) &xlrec, sizeof(xlrec));
+
+		XLogInsert(RM_SMGR_ID, XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
+	}
+
+	/* Don't keep the relation locked any longer than necessary! */
+	relation_close(rel, AccessExclusiveLock);
+
+	/* Nothing to return. */
+	PG_RETURN_VOID();
+}
+
+/*
  * Helper function to construct whichever TupleDesc we need for a particular
  * call.
  */
diff --git a/doc/src/sgml/pgvisibility.sgml b/doc/src/sgml/pgvisibility.sgml
index 4cdca7d..44e83de 100644
--- a/doc/src/sgml/pgvisibility.sgml
+++ b/doc/src/sgml/pgvisibility.sgml
@@ -9,14 +9,16 @@
 
  <para>
   The <filename>pg_visibility</> module provides a means for examining the
-  visibility map (VM) and page-level visibility information.
+  visibility map (VM) and page-level visibility information.  It also
+  provides functions to check the integrity of the visibility map and to
+  force it to be rebuilt.
  </para>
 
  <para>
-  These routines return information about three different bits.  The
-  all-visible bit in the visibility map indicates that every tuple on
-  a given page of a relation is visible to every current transaction.  The
-  all-frozen bit in the visibility map indicates that every tuple on the
+  Three different bits are used to store information about page-level
+  visibility.  The all-visible bit in the visibility map indicates that every
+  tuple on a given page of a relation is visible to every current transaction.
+  The all-frozen bit in the visibility map indicates that every tuple on the
   page is frozen; that is, no future vacuum will need to modify the page
   until such time as a tuple is inserted, updated, deleted, or locked on
   that page.  The page-level <literal>PD_ALL_VISIBLE</literal> bit has the
@@ -25,7 +27,8 @@
   will normally agree, but the page-level bit can sometimes be set while the
   visibility map bit is clear after a crash recovery; or they can disagree
   because of a change which occurs after <literal>pg_visibility</> examines
-  the visibility map and before it examines the data page.
+  the visibility map and before it examines the data page.  Any event which
+  causes data corruption can also cause these bits to disagree.
  </para>
 
  <para>
@@ -118,6 +121,21 @@
      </para>
     </listitem>
    </varlistentry>
+
+   <varlistentry>
+    <term><function>pg_truncate_visibility_map(regclass) returns void</function></term>
+
+    <listitem>
+     <para>
+      Truncates the visibility map for the given relation.  This function
+      is only expected to be useful if you suspect that the visibility map
+      for the indicated relation is corrupt and wish to rebuild it.  The first
+      <command>VACUUM</> executed on the given relation after this function
+      is executed will scan every page in the relation and rebuild the
+      visibility map.
+     </para>
+    </listitem>
+   </varlistentry>
   </variablelist>
 
   <para>
diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index 0c6e583..242d79a 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -37,7 +37,8 @@ smgr_desc(StringInfo buf, XLogReaderState *record)
 		xl_smgr_truncate *xlrec = (xl_smgr_truncate *) rec;
 		char	   *path = relpathperm(xlrec->rnode, MAIN_FORKNUM);
 
-		appendStringInfo(buf, "%s to %u blocks", path, xlrec->blkno);
+		appendStringInfo(buf, "%s to %u blocks flags %d", path,
+						 xlrec->blkno, xlrec->flags);
 		pfree(path);
 	}
 }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index 67f1906..0d8311c 100644
--- a/src/backend/catalog/storage.c
+++ b/src/backend/catalog/storage.c
@@ -268,6 +268,7 @@ RelationTruncate(Relation rel, BlockNumber nblocks)
 
 		xlrec.blkno = nblocks;
 		xlrec.rnode = rel->rd_node;
+		xlrec.flags = SMGR_TRUNCATE_ALL;
 
 		XLogBeginInsert();
 		XLogRegisterData((char *) &xlrec, sizeof(xlrec));
@@ -522,17 +523,22 @@ smgr_redo(XLogReaderState *record)
 		 */
 		XLogFlush(lsn);
 
-		smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
+		if ((xlrec->flags & SMGR_TRUNCATE_HEAP) != 0)
+		{
+			smgrtruncate(reln, MAIN_FORKNUM, xlrec->blkno);
 
-		/* Also tell xlogutils.c about it */
-		XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+			/* Also tell xlogutils.c about it */
+			XLogTruncateRelation(xlrec->rnode, MAIN_FORKNUM, xlrec->blkno);
+		}
 
 		/* Truncate FSM and VM too */
 		rel = CreateFakeRelcacheEntry(xlrec->rnode);
 
-		if (smgrexists(reln, FSM_FORKNUM))
+		if ((xlrec->flags & SMGR_TRUNCATE_FSM) != 0 &&
+			smgrexists(reln, FSM_FORKNUM))
 			FreeSpaceMapTruncateRel(rel, xlrec->blkno);
-		if (smgrexists(reln, VISIBILITYMAP_FORKNUM))
+		if ((xlrec->flags & SMGR_TRUNCATE_VM) != 0 &&
+			smgrexists(reln, VISIBILITYMAP_FORKNUM))
 			visibilitymap_truncate(rel, xlrec->blkno);
 
 		FreeFakeRelcacheEntry(rel);
diff --git a/src/include/catalog/storage_xlog.h b/src/include/catalog/storage_xlog.h
index 7207e8b..500e663 100644
--- a/src/include/catalog/storage_xlog.h
+++ b/src/include/catalog/storage_xlog.h
@@ -36,10 +36,18 @@ typedef struct xl_smgr_create
 	ForkNumber	forkNum;
 } xl_smgr_create;
 
+/* flags for xl_smgr_truncate */
+#define SMGR_TRUNCATE_HEAP		0x0001
+#define SMGR_TRUNCATE_VM		0x0002
+#define SMGR_TRUNCATE_FSM		0x0004
+#define SMGR_TRUNCATE_ALL		\
+	(SMGR_TRUNCATE_HEAP|SMGR_TRUNCATE_VM|SMGR_TRUNCATE_FSM)
+
 typedef struct xl_smgr_truncate
 {
 	BlockNumber blkno;
 	RelFileNode rnode;
+	int			flags;
 } xl_smgr_truncate;
 
 extern void log_smgrcreate(RelFileNode *rnode, ForkNumber forkNum);
-- 
2.5.4 (Apple Git-61)

disable-page-skipping-v1.patchtext/x-diff; charset=US-ASCII; name=disable-page-skipping-v1.patchDownload
From 010e99b403ec733d50c71a7d4ef646b1b446ef07 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jun 2016 22:52:58 -0400
Subject: [PATCH 2/2] Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.

---
 doc/src/sgml/ref/vacuum.sgml         | 21 ++++++++-
 src/backend/commands/vacuum.c        |  9 ++++
 src/backend/commands/vacuumlazy.c    | 87 ++++++++++++++++++++----------------
 src/backend/parser/gram.y            | 10 +++++
 src/include/nodes/parsenodes.h       |  3 +-
 src/test/regress/expected/vacuum.out |  1 +
 src/test/regress/sql/vacuum.sql      |  2 +
 7 files changed, 92 insertions(+), 41 deletions(-)

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 19fd748..dee1c5b 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
+VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable class="PARAMETER">table_name</replaceable> ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 </synopsis>
@@ -130,6 +130,25 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">
    </varlistentry>
 
    <varlistentry>
+    <term><literal>DISABLE_PAGE_SKIPPING</literal></term>
+    <listitem>
+     <para>
+      Normally, <command>VACUUM</> will skip pages based on the <link
+      linkend="vacuum-for-visibility-map">visibility map</>.  Pages where
+      all tuples are known to be frozen can always be skipped, and those
+      where all tuples are known to be visible to all transactions may be
+      skipped except when performing an aggressive vacuum.  Furthermore,
+      except when performing an aggressive vacuum, some pages may be skipped
+      in order to avoid waiting for other sessions to finish using them.
+      This option disables all page-skipping behavior, and is intended to
+      be used only the contents of the visibility map are thought to
+      be suspect, which should happen only if there is a hardware or software
+      issue causing database corruption.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="PARAMETER">table_name</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 25a55ab..0563e63 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -186,6 +186,15 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 						stmttype)));
 
 	/*
+	 * Sanity check DISABLE_PAGE_SKIPPING option.
+	 */
+	if ((options & VACOPT_FULL) != 0 &&
+		(options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
+
+	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
 	 */
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index cb5777f..7a67fa5 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -137,8 +137,9 @@ static BufferAccessStrategy vac_strategy;
 
 
 /* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive);
+static void lazy_scan_heap(Relation onerel, int options,
+			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
+			   bool aggressive);
 static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
@@ -223,15 +224,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						  &MultiXactCutoff, &mxactFullScanLimit);
 
 	/*
-	 * We request an aggressive scan if either the table's frozen Xid is now
+	 * We request an aggressive scan if the table's frozen Xid is now
 	 * older than or equal to the requested Xid full-table scan limit; or if
 	 * the table's minimum MultiXactId is older than or equal to the requested
-	 * mxid full-table scan limit.
+	 * mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was specified.
 	 */
 	aggressive = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
 											   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 											  mxactFullScanLimit);
+	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+		aggressive = true;
 
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
 
@@ -246,7 +249,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->hasindex = (nindexes > 0);
 
 	/* Do the vacuuming */
-	lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, aggressive);
+	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
 
 	/* Done with indexes */
 	vac_close_indexes(nindexes, Irel, NoLock);
@@ -441,7 +444,7 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
  *		reference them have been killed.
  */
 static void
-lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
+lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			   Relation *Irel, int nindexes, bool aggressive)
 {
 	BlockNumber nblocks,
@@ -542,25 +545,28 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 	 * the last page.  This is worth avoiding mainly because such a lock must
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
-	for (next_unskippable_block = 0;
-		 next_unskippable_block < nblocks;
-		 next_unskippable_block++)
+	next_unskippable_block = 0;
+	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 	{
-		uint8		vmstatus;
-
-		vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-											&vmbuffer);
-		if (aggressive)
+		while (next_unskippable_block < nblocks)
 		{
-			if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-				break;
-		}
-		else
-		{
-			if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-				break;
+			uint8		vmstatus;
+
+			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
+												&vmbuffer);
+			if (aggressive)
+			{
+				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+					break;
+			}
+			else
+			{
+				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+					break;
+			}
+			vacuum_delay_point();
+			next_unskippable_block++;
 		}
-		vacuum_delay_point();
 	}
 
 	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
@@ -594,26 +600,29 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
 		if (blkno == next_unskippable_block)
 		{
 			/* Time to advance next_unskippable_block */
-			for (next_unskippable_block++;
-				 next_unskippable_block < nblocks;
-				 next_unskippable_block++)
+			next_unskippable_block++;
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 			{
-				uint8		vmskipflags;
-
-				vmskipflags = visibilitymap_get_status(onerel,
-													   next_unskippable_block,
-													   &vmbuffer);
-				if (aggressive)
+				while (next_unskippable_block < nblocks)
 				{
-					if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-						break;
-				}
-				else
-				{
-					if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-						break;
+					uint8		vmskipflags;
+
+					vmskipflags = visibilitymap_get_status(onerel,
+													  next_unskippable_block,
+														   &vmbuffer);
+					if (aggressive)
+					{
+						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
+							break;
+					}
+					else
+					{
+						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
+							break;
+					}
+					vacuum_delay_point();
+					next_unskippable_block++;
 				}
-				vacuum_delay_point();
 			}
 
 			/*
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 2c950f9..edf4516 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -9370,6 +9370,16 @@ vacuum_option_elem:
 			| VERBOSE			{ $$ = VACOPT_VERBOSE; }
 			| FREEZE			{ $$ = VACOPT_FREEZE; }
 			| FULL				{ $$ = VACOPT_FULL; }
+			| IDENT
+				{
+					if (strcmp($1, "disable_page_skipping") == 0)
+						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+					else
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+							 errmsg("unrecognized VACUUM option \"%s\"", $1),
+									 parser_errposition(@1)));
+				}
 		;
 
 AnalyzeStmt:
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 714cf15..d36d9c6 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2822,7 +2822,8 @@ typedef enum VacuumOption
 	VACOPT_FREEZE = 1 << 3,		/* FREEZE option */
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_NOWAIT = 1 << 5,		/* don't wait to get lock (autovacuum only) */
-	VACOPT_SKIPTOAST = 1 << 6	/* don't process the TOAST table, if any */
+	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
 } VacuumOption;
 
 typedef struct VacuumStmt
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index d2d7503..9b604be 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -79,5 +79,6 @@ ERROR:  ANALYZE cannot be executed from VACUUM or ANALYZE
 CONTEXT:  SQL function "do_analyze" statement 1
 SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
+VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index f841201..7b819f6 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -60,5 +60,7 @@ VACUUM FULL pg_database;
 VACUUM FULL vaccluster;
 VACUUM FULL vactst;
 
+VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+
 DROP TABLE vaccluster;
 DROP TABLE vactst;
-- 
2.5.4 (Apple Git-61)

#2Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#1)
1 attachment(s)
Re: forcing a rebuild of the visibility map

On Thu, Jun 16, 2016 at 12:06 PM, Robert Haas <robertmhaas@gmail.com> wrote:

[ Changing subject line in the hopes of keeping separate issues in
separate threads. ]

On Mon, Jun 6, 2016 at 11:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Robert Haas <robertmhaas@gmail.com> writes:

I'm intuitively sympathetic to the idea that we should have an option
for this, but I can't figure out in what case we'd actually tell
anyone to use it. It would be useful for the kinds of bugs listed
above to have VACUUM (rebuild_vm) to blow away the VM fork and rebuild
it, but that's different semantics than what we proposed for VACUUM
(even_frozen_pages). And I'd be sort of inclined to handle that case
by providing some other way to remove VM forks (like a new function in
the pg_visibilitymap contrib module, maybe?) and then just tell people
to run regular VACUUM afterwards, rather than putting the actual VM
fork removal into VACUUM.

There's a lot to be said for that approach. If we do it, I'd be a bit
inclined to offer an option to blow away the FSM as well.

After having been scared out of my mind by the discovery of
longstanding breakage in heap_update[1], it occurred to me that this
is an excellent example of a case in which the option for which Andres
was agitating - specifically forcing VACUUM to visit ever single page
in the heap - would be useful. Even if there's functionality
available to blow away the visibility map forks, you might not want to
just go remove them all and re-vacuum everything, because while you
were rebuilding the VM forks, index-only scans would stop being
index-only, and that might cause an operational problem. Being able
to simply force every page to be visited is better. Patch for that
attached. I decided to call the option disable_page_skipping rather
than even_frozen_pages because it should also force visiting pages
that are all-visible but not all-frozen. (I was sorely tempted to go
with the competing proposal of calling it VACUUM (THEWHOLEDAMNTHING)
but I refrained.)

However, I also think that the approach described above - providing a
way to excise VM forks specifically - is useful. Patch for that
attached, too. It turns out that can't be done without either adding
a new WAL record type or extending an existing one; I chose to add a
"flags" argument to XLOG_SMGR_TRUNCATE, specifying the forks to be
truncated. Since this will require bumping the XLOG version, if we're
going to do it, and I think we should, it would be good to try to get
it done before beta2 rather than closer to release. However, I don't
want to commit this without some review, so please review this.
Actually, please review both patches.[2] The same core support could
be used to add an option to truncate the FSM, but I didn't write a
patch for that because I'm incredibly { busy | lazy }.

Option name DISABLE_PAGE_SKIPPING is good to me.
I'm still working on this, but here are some review comments.

---
+CREATE FUNCTION pg_truncate_visibility_map(regclass)
+RETURNS void
+AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
+LANGUAGE C STRICT
+PARALLEL UNSAFE;  -- let's not make this any more dangerous
+

"REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM
PUBLIC;" is missing in pg_visibility/pg_visibility--1.0--1.1.sql.

---
I think that VACUUM with VERBOSE option can show the information for
how many frozen pages we skipped like autovacuum does. Thought?
Patch attached.

Regards,

--
Masahiko Sawada

Attachments:

lazy_scan_heap_verbose_log.patchtext/x-diff; charset=US-ASCII; name=lazy_scan_heap_verbose_log.patchDownload
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 7a67fa5..ddbb835 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -1329,6 +1329,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 									"Skipped %u pages due to buffer pins.\n",
 									vacrelstats->pinskipped_pages),
 					 vacrelstats->pinskipped_pages);
+	appendStringInfo(&buf, _("Skipped %u frozen pages.\n"),
+					 vacrelstats->frozenskipped_pages);
 	appendStringInfo(&buf, ngettext("%u page is entirely empty.\n",
 									"%u pages are entirely empty.\n",
 									empty_pages),
#3Robert Haas
robertmhaas@gmail.com
In reply to: Masahiko Sawada (#2)
Re: forcing a rebuild of the visibility map

On Thu, Jun 16, 2016 at 2:33 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Option name DISABLE_PAGE_SKIPPING is good to me.
I'm still working on this, but here are some review comments.

---
+CREATE FUNCTION pg_truncate_visibility_map(regclass)
+RETURNS void
+AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
+LANGUAGE C STRICT
+PARALLEL UNSAFE;  -- let's not make this any more dangerous
+

"REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM
PUBLIC;" is missing in pg_visibility/pg_visibility--1.0--1.1.sql.

OK, thanks. I'll fix that.

I think that VACUUM with VERBOSE option can show the information for
how many frozen pages we skipped like autovacuum does. Thought?
Patch attached.

I'm not sure. The messages VACUUM emits are already quite long and
hard to read, and adding more lines to them might make that problem
worse. On the other hand, having more details can be helpful, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#4Masahiko Sawada
sawada.mshk@gmail.com
In reply to: Robert Haas (#3)
Re: forcing a rebuild of the visibility map

On Thu, Jun 16, 2016 at 10:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Jun 16, 2016 at 2:33 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Option name DISABLE_PAGE_SKIPPING is good to me.
I'm still working on this, but here are some review comments.

---
+CREATE FUNCTION pg_truncate_visibility_map(regclass)
+RETURNS void
+AS 'MODULE_PATHNAME', 'pg_truncate_visibility_map'
+LANGUAGE C STRICT
+PARALLEL UNSAFE;  -- let's not make this any more dangerous
+

"REVOKE ALL ON FUNCTION pg_truncate_visibility_map(regclass) FROM
PUBLIC;" is missing in pg_visibility/pg_visibility--1.0--1.1.sql.

OK, thanks. I'll fix that.

I think that VACUUM with VERBOSE option can show the information for
how many frozen pages we skipped like autovacuum does. Thought?
Patch attached.

I'm not sure. The messages VACUUM emits are already quite long and
hard to read, and adding more lines to them might make that problem
worse. On the other hand, having more details can be helpful, too.

Because these informations are emitted only when VERBOSE option is
specified, I think that it would be better to output that rather than
nothing, although it makes messages more complicated.

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#5Andres Freund
andres@anarazel.de
In reply to: Robert Haas (#1)
Re: forcing a rebuild of the visibility map

Hi,

On 2016-06-15 23:06:37 -0400, Robert Haas wrote:

After having been scared out of my mind by the discovery of
longstanding breakage in heap_update[1], it occurred to me that this
is an excellent example of a case in which the option for which Andres
was agitating - specifically forcing VACUUM to visit ever single page
in the heap - would be useful. [...] . Being able
to simply force every page to be visited is better. Patch for that
attached. I decided to call the option disable_page_skipping rather
than even_frozen_pages because it should also force visiting pages
that are all-visible but not all-frozen.

However, I also think that the approach described above - providing a
way to excise VM forks specifically - is useful.

I think both make sense.

From 010e99b403ec733d50c71a7d4ef646b1b446ef07 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jun 2016 22:52:58 -0400
Subject: [PATCH 2/2] Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.

Furthermore,
+      except when performing an aggressive vacuum, some pages may be skipped
+      in order to avoid waiting for other sessions to finish using them.

Isn't that still the case? We'll not wait for a cleanup lock and such,
if not necessary?

/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive);
+static void lazy_scan_heap(Relation onerel, int options,
+			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
+			   bool aggressive);

Generally I think it's better to pass bitmaks arguments as an unsigned
integer. But since other related routines already use int...

/*
-	 * We request an aggressive scan if either the table's frozen Xid is now
+	 * We request an aggressive scan if the table's frozen Xid is now
* older than or equal to the requested Xid full-table scan limit; or if
* the table's minimum MultiXactId is older than or equal to the requested
-	 * mxid full-table scan limit.
+	 * mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was specified.
*/
aggressive = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
xidFullScanLimit);
aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
mxactFullScanLimit);
+	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+		aggressive = true;

I'm inclined to convert the agressive |= to an if, it looks a bit
jarring if consecutive assignments use differing forms.

static void
-lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
+lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
Relation *Irel, int nindexes, bool aggressive)
{
BlockNumber nblocks,
@@ -542,25 +545,28 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
* the last page.  This is worth avoiding mainly because such a lock must
* be replayed on any hot standby, where it can be disruptive.
*/
-	for (next_unskippable_block = 0;
-		 next_unskippable_block < nblocks;
-		 next_unskippable_block++)
+	next_unskippable_block = 0;
+	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
{
-		uint8		vmstatus;
-
-		vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-											&vmbuffer);
-		if (aggressive)
+		while (next_unskippable_block < nblocks)
{
-			if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-				break;
-		}
-		else
-		{
-			if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-				break;
+			uint8		vmstatus;
+
+			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
+												&vmbuffer);
+			if (aggressive)
+			{
+				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+					break;
+			}
+			else
+			{
+				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+					break;
+			}
+			vacuum_delay_point();
+			next_unskippable_block++;
}
-		vacuum_delay_point();
}
if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
@@ -594,26 +600,29 @@ lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
if (blkno == next_unskippable_block)
{
/* Time to advance next_unskippable_block */
-			for (next_unskippable_block++;
-				 next_unskippable_block < nblocks;
-				 next_unskippable_block++)
+			next_unskippable_block++;
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
{
-				uint8		vmskipflags;
-
-				vmskipflags = visibilitymap_get_status(onerel,
-													   next_unskippable_block,
-													   &vmbuffer);
-				if (aggressive)
+				while (next_unskippable_block < nblocks)
{
-					if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-						break;
-				}
-				else
-				{
-					if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-						break;
+					uint8		vmskipflags;
+
+					vmskipflags = visibilitymap_get_status(onerel,
+													  next_unskippable_block,
+														   &vmbuffer);
+					if (aggressive)
+					{
+						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
+							break;
+					}
+					else
+					{
+						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
+							break;
+					}
+					vacuum_delay_point();
+					next_unskippable_block++;
}
-				vacuum_delay_point();
}

This code really makes me unhappy, everytime I see it.

+			| IDENT
+				{
+					if (strcmp($1, "disable_page_skipping") == 0)
+						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+					else
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+							 errmsg("unrecognized VACUUM option \"%s\"", $1),
+									 parser_errposition(@1)));
+				}
;

If we could get rid of that indentation behaviour by pgindent, I'd be
eclectic.

/*
+ * Remove the visibility map fork for a relation.  If there turn out to be
+ * any bugs in the visibility map code that require rebuilding the VM, this
+ * provides users with a way to do it that is cleaner than shutting down the
+ * server and removing files by hand.
+ *
+ * This is a cut-down version of RelationTruncate.
+ */
+Datum
+pg_truncate_visibility_map(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	Relation	rel;
+
+	rel = relation_open(relid, AccessExclusiveLock);
+
+	if (rel->rd_rel->relkind != RELKIND_RELATION &&
+		rel->rd_rel->relkind != RELKIND_MATVIEW &&
+		rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+		ereport(ERROR,
+				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+		   errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+				  RelationGetRelationName(rel))));
+
+	RelationOpenSmgr(rel);
+	rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
+
+	visibilitymap_truncate(rel, 0);
+
+	if (RelationNeedsWAL(rel))
+	{
+		xl_smgr_truncate	xlrec;
+
+		xlrec.blkno = 0;
+		xlrec.rnode = rel->rd_node;
+		xlrec.flags = SMGR_TRUNCATE_VM;
+
+		XLogBeginInsert();
+		XLogRegisterData((char *) &xlrec, sizeof(xlrec));
+
+		XLogInsert(RM_SMGR_ID, XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
+	}

Having an XLogInsert() in contrib makes me more than a bit squeamish. I
think it'd be fair bit better to have that section of code in
visibilitymap.c, and then call that from the extension.

+	/* Don't keep the relation locked any longer than necessary! */
+	relation_close(rel, AccessExclusiveLock);

Don't think that's a good idea. We use transactional semantics for a
good reason, and the function returns afterwards anyway.

I'm also thinking that we should perform owner-checks in these
functions. Sure, by default they're superuser only. But there's
legitimate reason to grant execute to other users, but that shouldn't
automatically mean they can do everything. That's probably done best in
a separate commit tho.

+/* flags for xl_smgr_truncate */
+#define SMGR_TRUNCATE_HEAP		0x0001
+#define SMGR_TRUNCATE_VM		0x0002
+#define SMGR_TRUNCATE_FSM		0x0004
+#define SMGR_TRUNCATE_ALL		\
+	(SMGR_TRUNCATE_HEAP|SMGR_TRUNCATE_VM|SMGR_TRUNCATE_FSM)
+

Hm. I wonder if somebody will forget to update this the next time a new
fork is introduced...

- Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#6Robert Haas
robertmhaas@gmail.com
In reply to: Andres Freund (#5)
Re: forcing a rebuild of the visibility map

On Fri, Jun 17, 2016 at 2:48 PM, Andres Freund <andres@anarazel.de> wrote:

From 010e99b403ec733d50c71a7d4ef646b1b446ef07 Mon Sep 17 00:00:00 2001
From: Robert Haas <rhaas@postgresql.org>
Date: Wed, 15 Jun 2016 22:52:58 -0400
Subject: [PATCH 2/2] Add VACUUM (DISABLE_PAGE_SKIPPING) for emergencies.

Furthermore,
+      except when performing an aggressive vacuum, some pages may be skipped
+      in order to avoid waiting for other sessions to finish using them.

Isn't that still the case? We'll not wait for a cleanup lock and such,
if not necessary?

That's just there to explain what behavior gets changed when you
specify DISABLE_PAGE_SKIPPING.

/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
-                        Relation *Irel, int nindexes, bool aggressive);
+static void lazy_scan_heap(Relation onerel, int options,
+                        LVRelStats *vacrelstats, Relation *Irel, int nindexes,
+                        bool aggressive);

Generally I think it's better to pass bitmaks arguments as an unsigned
integer. But since other related routines already use int...

Many years ago, I was told by Tom Lane that project standard was int.
My gut was to use unsigned too, but I've got better things to do than
argue about it - and project style is a legitimate argument in any
case.

/*
-      * We request an aggressive scan if either the table's frozen Xid is now
+      * We request an aggressive scan if the table's frozen Xid is now
* older than or equal to the requested Xid full-table scan limit; or if
* the table's minimum MultiXactId is older than or equal to the requested
-      * mxid full-table scan limit.
+      * mxid full-table scan limit; or if DISABLE_PAGE_SKIPPING was specified.
*/
aggressive = TransactionIdPrecedesOrEquals(onerel->rd_rel->relfrozenxid,
xidFullScanLimit);
aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
mxactFullScanLimit);
+     if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+             aggressive = true;

I'm inclined to convert the agressive |= to an if, it looks a bit
jarring if consecutive assignments use differing forms.

I thought about that, but I like it better the way I have it.

This code really makes me unhappy, everytime I see it.

That's not a defect in this patch.

+                     | IDENT
+                             {
+                                     if (strcmp($1, "disable_page_skipping") == 0)
+                                             $$ = VACOPT_DISABLE_PAGE_SKIPPING;
+                                     else
+                                             ereport(ERROR,
+                                                             (errcode(ERRCODE_SYNTAX_ERROR),
+                                                      errmsg("unrecognized VACUUM option \"%s\"", $1),
+                                                                      parser_errposition(@1)));
+                             }
;

If we could get rid of that indentation behaviour by pgindent, I'd be
eclectic.

Neither is that, although the indentation there looks fine to me.

/*
+ * Remove the visibility map fork for a relation.  If there turn out to be
+ * any bugs in the visibility map code that require rebuilding the VM, this
+ * provides users with a way to do it that is cleaner than shutting down the
+ * server and removing files by hand.
+ *
+ * This is a cut-down version of RelationTruncate.
+ */
+Datum
+pg_truncate_visibility_map(PG_FUNCTION_ARGS)
+{
+     Oid                     relid = PG_GETARG_OID(0);
+     Relation        rel;
+
+     rel = relation_open(relid, AccessExclusiveLock);
+
+     if (rel->rd_rel->relkind != RELKIND_RELATION &&
+             rel->rd_rel->relkind != RELKIND_MATVIEW &&
+             rel->rd_rel->relkind != RELKIND_TOASTVALUE)
+             ereport(ERROR,
+                             (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+                errmsg("\"%s\" is not a table, materialized view, or TOAST table",
+                               RelationGetRelationName(rel))));
+
+     RelationOpenSmgr(rel);
+     rel->rd_smgr->smgr_vm_nblocks = InvalidBlockNumber;
+
+     visibilitymap_truncate(rel, 0);
+
+     if (RelationNeedsWAL(rel))
+     {
+             xl_smgr_truncate        xlrec;
+
+             xlrec.blkno = 0;
+             xlrec.rnode = rel->rd_node;
+             xlrec.flags = SMGR_TRUNCATE_VM;
+
+             XLogBeginInsert();
+             XLogRegisterData((char *) &xlrec, sizeof(xlrec));
+
+             XLogInsert(RM_SMGR_ID, XLOG_SMGR_TRUNCATE | XLR_SPECIAL_REL_UPDATE);
+     }

Having an XLogInsert() in contrib makes me more than a bit squeamish. I
think it'd be fair bit better to have that section of code in
visibilitymap.c, and then call that from the extension.

I thought about that too, but it seemed like it was just bloating the
core server for no real reason. It's not like contrib is off in space
someplace.

+     /* Don't keep the relation locked any longer than necessary! */
+     relation_close(rel, AccessExclusiveLock);

Don't think that's a good idea. We use transactional semantics for a
good reason, and the function returns afterwards anyway.

Yeah, but if you want to blow away a bunch of visibility map forks at
one go, you're not going to appreciate this function collecting all of
those locks at the same time. This is also why VACUUM starts a
separate transaction for each table.

I'm also thinking that we should perform owner-checks in these
functions. Sure, by default they're superuser only. But there's
legitimate reason to grant execute to other users, but that shouldn't
automatically mean they can do everything. That's probably done best in
a separate commit tho.

I think it is a very dubious idea to grant access to these functions
to anyone other than the superuser, but if you want to add an owner
check, go for it!

+/* flags for xl_smgr_truncate */
+#define SMGR_TRUNCATE_HEAP           0x0001
+#define SMGR_TRUNCATE_VM             0x0002
+#define SMGR_TRUNCATE_FSM            0x0004
+#define SMGR_TRUNCATE_ALL            \
+     (SMGR_TRUNCATE_HEAP|SMGR_TRUNCATE_VM|SMGR_TRUNCATE_FSM)
+

Hm. I wonder if somebody will forget to update this the next time a new
fork is introduced...

Gosh, I hope we go for getting rid of some of these forks instead of
adding more, but yeah, that could possibly happen. I don't think it's
a big cause for concern, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#7Robert Haas
robertmhaas@gmail.com
In reply to: Robert Haas (#6)
Re: forcing a rebuild of the visibility map

Since time is short and it seems better to get the xlog page magic
bump done before beta2, I've gone ahead and committed this, but there
are two points here that seem to warrant some input from other senior
hackers. Andres and I have discussed these points off list but
without reaching a meeting of the minds.

On Fri, Jun 17, 2016 at 2:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Having an XLogInsert() in contrib makes me more than a bit squeamish. I
think it'd be fair bit better to have that section of code in
visibilitymap.c, and then call that from the extension.

I thought about that too, but it seemed like it was just bloating the
core server for no real reason. It's not like contrib is off in space
someplace.

So, Andres thinks it's a bad idea to have an XLogInsert() call in
contrib, and I don't think it matters. Anyone else have an opinion?
Andres, do you want to explain the nature of your concern further?

+     /* Don't keep the relation locked any longer than necessary! */
+     relation_close(rel, AccessExclusiveLock);

Don't think that's a good idea. We use transactional semantics for a
good reason, and the function returns afterwards anyway.

Yeah, but if you want to blow away a bunch of visibility map forks at
one go, you're not going to appreciate this function collecting all of
those locks at the same time. This is also why VACUUM starts a
separate transaction for each table.

Andres has a concern here that there might be a problem related to
invalidation messages. Currently, the only invalidation message
that's getting generated here is a non-transactional smgr
invalidation, which is sent before the lock is released and therefore
no race exists. However, Andres is concerned that this may at a
minimum not be very future-proof and possibly unsafe even today. To
me, however, this doesn't seem particularly different than what e.g.
lazy_truncate_heap() always does and has done for a long time. More
input here is welcome and appreciated - it would be good not to screw
this up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#8Michael Paquier
michael.paquier@gmail.com
In reply to: Robert Haas (#7)
Re: forcing a rebuild of the visibility map

On Sat, Jun 18, 2016 at 6:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Jun 17, 2016 at 2:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:

Having an XLogInsert() in contrib makes me more than a bit squeamish. I
think it'd be fair bit better to have that section of code in
visibilitymap.c, and then call that from the extension.

I thought about that too, but it seemed like it was just bloating the
core server for no real reason. It's not like contrib is off in space
someplace.

So, Andres thinks it's a bad idea to have an XLogInsert() call in
contrib, and I don't think it matters. Anyone else have an opinion?

I am meh as well with this practice that we should not encourage. xlog
insertion routines should just be generated by in-core code. We have
now the generic WAL interface for extensions, and if need be it should
be extended.

Andres, do you want to explain the nature of your concern further?

I am not in his mind, but my guess is that contrib modules are
sometimes used as template examples by other people, and encouraging
users to use those routines in modules would increase the risk to
misuse them, aka badly-formed records that could corrupt the system.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#9Tom Lane
tgl@sss.pgh.pa.us
In reply to: Michael Paquier (#8)
Re: forcing a rebuild of the visibility map

Michael Paquier <michael.paquier@gmail.com> writes:

On Sat, Jun 18, 2016 at 6:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Andres, do you want to explain the nature of your concern further?

I am not in his mind, but my guess is that contrib modules are
sometimes used as template examples by other people, and encouraging
users to use those routines in modules would increase the risk to
misuse them, aka badly-formed records that could corrupt the system.

I don't follow that argument. People writing new extensions are just
as likely to copy from core code as contrib.

If Andres' concern is that XLogInsert isn't a very stable API, maybe
we could address that by providing some wrapper function that knows
how to emit the specific kind of record that pg_visibility needs.
But on the whole it seems like make-work, unless there's a reason
to believe that other extensions will need to generate that exact
same record type.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10Andres Freund
andres@anarazel.de
In reply to: Tom Lane (#9)
Re: forcing a rebuild of the visibility map

On 2016-06-18 11:56:51 -0400, Tom Lane wrote:

Michael Paquier <michael.paquier@gmail.com> writes:

On Sat, Jun 18, 2016 at 6:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

Andres, do you want to explain the nature of your concern further?

I am not in his mind, but my guess is that contrib modules are
sometimes used as template examples by other people, and encouraging
users to use those routines in modules would increase the risk to
misuse them, aka badly-formed records that could corrupt the system.

That's not it, no.

If Andres' concern is that XLogInsert isn't a very stable API, maybe
we could address that by providing some wrapper function that knows
how to emit the specific kind of record that pg_visibility needs.

That's part of the concern I have, yes. It's pretty common that during
minor version updates contrib modules are updated before the main server
is restarted. Increasing the coupling on something as critical as WAL
logging doesn't strike me as a good idea.

I also don't see why it's a good idea to have knowledge about how to
truncate the visibility map outside of visibilitymap.c. Having that in a
contrib module just seems like a modularity violation. To me this
should be a wal_log paramter to visibilitymap_truncate().

Andres

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11Tom Lane
tgl@sss.pgh.pa.us
In reply to: Andres Freund (#10)
Re: forcing a rebuild of the visibility map

Andres Freund <andres@anarazel.de> writes:

I also don't see why it's a good idea to have knowledge about how to
truncate the visibility map outside of visibilitymap.c. Having that in a
contrib module just seems like a modularity violation.

That seems like a pretty good argument.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12Robert Haas
robertmhaas@gmail.com
In reply to: Tom Lane (#11)
Re: forcing a rebuild of the visibility map

On Mon, Jun 20, 2016 at 4:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@anarazel.de> writes:

I also don't see why it's a good idea to have knowledge about how to
truncate the visibility map outside of visibilitymap.c. Having that in a
contrib module just seems like a modularity violation.

That seems like a pretty good argument.

I agree that's a good argument but it passes over one or two points
which motivated my approach. Most of the work of truncating the
visibility map is, in fact, encapsulated by visibilitymap_truncate, so
the only knowledge we're exporting to the contrib module is what WAL
record to emit. I put that in the caller of visibilitymap_truncate
because the only existing caller of visibilitymap_truncate is
RelationTruncate, which also knows what WAL record to emit on behalf
of visibilitymap.c. So I don't think I've exported any more knowledge
from visibilitymap.c than was the case previously.

That having been said, if somebody wants to whack this around, I am
not going to put up a big fight.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers