Resume vacuum and autovacuum from interruption and cancellation

Started by Masahiko Sawadaover 6 years ago14 messages

sawada.mshk@gmail.com

over 6 years ago

Hi all,

Long-running vacuum could be sometimes cancelled by administrator. And
autovacuums could be cancelled by concurrent processes. Even if it
retries after cancellation, since it always restart from the first
block of table it could vacuums blocks again that we vacuumed last
time. We have visibility map to skip scanning all-visible blocks but
in case where the table is large and often modified, we're more likely
to reclaim more garbage from blocks other than we processed last time
than scanning from the first block.

So I'd like to propose to make vacuums save its progress and resume
vacuuming based on it. The mechanism I'm thinking is simple; vacuums
periodically report the current block number to the stats collector.
If table has indexes, reports it after heap vacuum whereas reports it
every certain amount of blocks (e.g. 1024 blocks = 8MB) if no indexes.
We can see that value on new column vacuum_resume_block of
pg_stat_all_tables. I'm going to add one vacuum command option RESUME
and one new reloption vacuum_resume. If the option is true vacuums
fetch the block number from stats collector before starting and start
vacuuming from that block. I wonder if we could make it true by
default for autovacuums but it must be false when aggressive vacuum.

If we start to vacuum from not first block, we can update neither
relfrozenxid nor relfrozenxmxid. And we might not be able to update
even relation statistics.

Comment and feedback are very welcome.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Masahiko Sawada

sawada.mshk@gmail.com

over 6 years ago

In reply to: Masahiko Sawada (#1)

1 attachment(s)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Wed, Jun 12, 2019 at 1:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

Long-running vacuum could be sometimes cancelled by administrator. And
autovacuums could be cancelled by concurrent processes. Even if it
retries after cancellation, since it always restart from the first
block of table it could vacuums blocks again that we vacuumed last
time. We have visibility map to skip scanning all-visible blocks but
in case where the table is large and often modified, we're more likely
to reclaim more garbage from blocks other than we processed last time
than scanning from the first block.

So I'd like to propose to make vacuums save its progress and resume
vacuuming based on it. The mechanism I'm thinking is simple; vacuums
periodically report the current block number to the stats collector.
If table has indexes, reports it after heap vacuum whereas reports it
every certain amount of blocks (e.g. 1024 blocks = 8MB) if no indexes.
We can see that value on new column vacuum_resume_block of
pg_stat_all_tables. I'm going to add one vacuum command option RESUME
and one new reloption vacuum_resume. If the option is true vacuums
fetch the block number from stats collector before starting and start
vacuuming from that block. I wonder if we could make it true by
default for autovacuums but it must be false when aggressive vacuum.

If we start to vacuum from not first block, we can update neither
relfrozenxid nor relfrozenxmxid. And we might not be able to update
even relation statistics.

Attached the first version of patch. And registered this item to the
next commit fest.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchtext/x-patch; charset=US-ASCII; name=0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchDownload

From 74c035d887c8f76b43414ce80559794a399ebaf7 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 10 Jul 2019 19:02:56 +0900
Subject: [PATCH] Add RESUME option to VACUUM and autovacuum.

This commit adds a new reloption, vaucum_resume, which controls
whether vacuum attempt to resume vacuuming from the last vacuumed
block saved at vacuum_resume_block column of pg_stat_all_tables. It
also adds a new option to the VACUUM command, RESUME which can be used
to override the reloption.
---
 doc/src/sgml/monitoring.sgml           |  5 ++
 doc/src/sgml/ref/vacuum.sgml           | 18 +++++
 src/backend/access/common/reloptions.c | 13 +++-
 src/backend/access/heap/vacuumlazy.c   | 91 ++++++++++++++++++++++++--
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/commands/vacuum.c          | 13 ++++
 src/backend/postmaster/pgstat.c        | 42 ++++++++++++
 src/backend/utils/adt/pgstatfuncs.c    | 14 ++++
 src/include/catalog/pg_proc.dat        |  5 ++
 src/include/commands/vacuum.h          |  5 +-
 src/include/pgstat.h                   | 14 ++++
 src/include/utils/rel.h                |  2 +
 src/test/regress/expected/rules.out    |  3 +
 src/test/regress/expected/vacuum.out   | 20 ++++++
 src/test/regress/sql/vacuum.sql        | 21 ++++++
 15 files changed, 258 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..fe68113b02 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2784,6 +2784,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
      <entry><type>bigint</type></entry>
      <entry>Estimated number of rows modified since this table was last analyzed</entry>
     </row>
+    <row>
+     <entry><structfield>vacuum_resume_block</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Block number to resume vacuuming from</entry>
+    </row>
     <row>
      <entry><structfield>last_vacuum</structfield></entry>
      <entry><type>timestamp with time zone</type></entry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index f9b0fb8794..0b8733d555 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
     INDEX_CLEANUP [ <replaceable class="parameter">boolean</replaceable> ]
     TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
+    RESUME [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -223,6 +224,23 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>RESUME</literal></term>
+    <listitem>
+     <para>
+      Specifies that <command>VACUUM</command> should attempt to
+      resume vacuuming from the last vacuumed block saved at
+      <literal>vacuum_resume_block</literal> column of
+      <xref linkend="pg-stat-all-tables-view"/>. This behavior is helpful
+      when to resume vacuuming from interruption and cancellation.The default
+      is false unless the <literal>vacuum_resume</literal> option has been
+      set to true. This option is ignored if either the <literal>FULL</literal>,
+      the <literal>FREEZE</literal> or <literal>DISABLE_PAGE_SKIPPING</literal>
+      option is used.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 5773021499..6494c3bdfd 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -158,6 +158,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"vacuum_resume",
+			"Enables vacuum to resume from the last vacuumed block",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		false
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1412,7 +1421,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"vacuum_index_cleanup", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, vacuum_index_cleanup)},
 		{"vacuum_truncate", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, vacuum_truncate)}
+		offsetof(StdRdOptions, vacuum_truncate)},
+		{"vacuum_resume", RELOPT_TYPE_BOOL,
+		offsetof(StdRdOptions, vacuum_resume)}
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1df3b..e62a07654e 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -91,6 +91,13 @@
 #define VACUUM_FSM_EVERY_PAGES \
 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
 
+/*
+ * When a table has no indexes, save the progress every 8GB so that we can
+ * resume vacuum from the middle of table. When table has indexes we save it
+ * after the second heap pass finished.
+ */
+#define VACUUM_RESUME_BLK_INTERVAL 1024 /* 8MB */
+
 /*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
@@ -175,6 +182,7 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static BlockNumber get_resume_block(Relation onerel);
 
 
 /*
@@ -212,6 +220,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	Assert(params != NULL);
 	Assert(params->index_cleanup != VACOPT_TERNARY_DEFAULT);
 	Assert(params->truncate != VACOPT_TERNARY_DEFAULT);
+	Assert(params->resume != VACOPT_TERNARY_DEFAULT);
 
 	/* not every AM requires these to be valid, but heap does */
 	Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
@@ -504,7 +513,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
 				vacuumed_pages,
-				next_fsm_block_to_vacuum;
+				next_fsm_block_to_vacuum,
+				next_block_to_resume;
 	double		num_tuples,		/* total number of nonremovable tuples */
 				live_tuples,	/* live tuples (reltuples estimate) */
 				tups_vacuumed,	/* tuples cleaned up by vacuum */
@@ -515,6 +525,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber next_unskippable_block;
+	BlockNumber	start_blkno = 0;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
@@ -527,6 +538,19 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	pg_rusage_init(&ru0);
 
+	/*
+	 * If resuming is not requested, we clear the last saved block so as not
+	 * keep the previous information. If requested and it is not an aggressive
+	 * vacuum, we fetch the last saved block number to resume and set it as the
+	 * starting block to vacuum.
+	 */
+	if (params->resume == VACOPT_TERNARY_DISABLED)
+		pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+										  onerel->rd_rel->relisshared,
+										  0);
+	else if (!aggressive)
+		start_blkno = get_resume_block(onerel);
+
 	relname = RelationGetRelationName(onerel);
 	if (aggressive)
 		ereport(elevel,
@@ -534,19 +558,30 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 	else
-		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						relname)));
+	{
+		if (start_blkno != 0)
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\" from %u block",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname, start_blkno)));
+		else
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\"",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname)));
+	}
 
 	empty_pages = vacuumed_pages = 0;
 	next_fsm_block_to_vacuum = (BlockNumber) 0;
+	next_block_to_resume = (BlockNumber) 0;
 	num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
 
 	nblocks = RelationGetNumberOfBlocks(onerel);
+	Assert(start_blkno <= nblocks);	/* both are the same iif it's empty */
+
 	vacrelstats->rel_pages = nblocks;
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->tupcount_pages = 0;
@@ -606,7 +641,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * the last page.  This is worth avoiding mainly because such a lock must
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
-	next_unskippable_block = 0;
+	next_unskippable_block = start_blkno;
 	if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 	{
 		while (next_unskippable_block < nblocks)
@@ -635,7 +670,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
-	for (blkno = 0; blkno < nblocks; blkno++)
+	for (blkno = start_blkno; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
 		Page		page;
@@ -799,6 +834,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 			next_fsm_block_to_vacuum = blkno;
 
+			/* Save the current block number to resume vacuuming */
+			pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+											  onerel->rd_rel->relisshared,
+											  blkno);
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
@@ -1271,6 +1311,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			 */
 			vacrelstats->num_dead_tuples = 0;
 
+			/* Save the current block number to resume vacuuming */
+			if (blkno - next_block_to_resume >= VACUUM_RESUME_BLK_INTERVAL)
+			{
+				pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+												  onerel->rd_rel->relisshared,
+												  blkno);
+				next_block_to_resume = blkno;
+			}
+
 			/*
 			 * Periodically do incremental FSM vacuuming to make newly-freed
 			 * space visible on upper FSM pages.  Note: although we've cleaned
@@ -1458,6 +1507,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	if (blkno > next_fsm_block_to_vacuum)
 		FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 
+	/* Clear the saved block number */
+	pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+									  onerel->rd_rel->relisshared,
+									  0);
+
 	/* report all blocks vacuumed; and that we're cleaning up */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
@@ -2354,3 +2408,26 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number to resume vacuuming from stats collector.
+ */
+static BlockNumber
+get_resume_block(Relation onerel)
+{
+	Oid		oid = RelationGetRelid(onerel);
+	bool	found;
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry	*tabentry;
+
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+	tabentry = hash_search(dbentry->tables, (void *) &oid,
+						   HASH_FIND, &found);
+
+	/* No found valid saved block number, resume from the first block */
+	if (!found ||
+		tabentry->vacuum_resume_block >= RelationGetNumberOfBlocks(onerel))
+		return (BlockNumber) 0;
+
+	return tabentry->vacuum_resume_block;
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..3e669561df 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -566,6 +566,7 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_live_tuples(C.oid) AS n_live_tup,
             pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
             pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
+            pg_stat_get_vacuum_resume_block(C.oid) AS vacuum_resume_blk,
             pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
             pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
             pg_stat_get_last_analyze_time(C.oid) as last_analyze,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e7b379dfda..4602a96ec4 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -99,6 +99,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	/* Set default value */
 	params.index_cleanup = VACOPT_TERNARY_DEFAULT;
 	params.truncate = VACOPT_TERNARY_DEFAULT;
+	params.resume = VACOPT_TERNARY_DEFAULT;
 
 	/* Parse options list */
 	foreach(lc, vacstmt->options)
@@ -127,6 +128,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			disable_page_skipping = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "index_cleanup") == 0)
 			params.index_cleanup = get_vacopt_ternary_value(opt);
+		else if (strcmp(opt->defname, "resume") == 0)
+			params.resume = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else
@@ -1774,6 +1777,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 			params->truncate = VACOPT_TERNARY_DISABLED;
 	}
 
+	/* Set resume option based on reloptions if not yet, default is false */
+	if (params->resume == VACOPT_TERNARY_DEFAULT)
+	{
+		if (onerel->rd_options == NULL ||
+			!((StdRdOptions *) onerel->rd_options)->vacuum_resume)
+			params->resume = VACOPT_TERNARY_DISABLED;
+		else
+			params->resume = VACOPT_TERNARY_ENABLED;
+	}
+
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast table is
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28b51..1d75240081 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -328,6 +328,8 @@ static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, in
 static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len);
 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
 static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
+static void pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg,
+											int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
@@ -1425,6 +1427,27 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	pgstat_send(&msg, sizeof(msg));
 }
 
+/* ---------
+ * pgstat_report_vacuum_resume_block() -
+ *
+ *	Tell the collector about the block number to resume.
+ * ---------
+ */
+void
+pgstat_report_vacuum_resume_block(Oid tableoid, bool shared, BlockNumber blkno)
+{
+	PgStat_MsgVacuumResumeBlock msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACUUMRESUMEBLOCK);
+	msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
+	msg.m_tableoid = tableoid;
+	msg.m_blkno = blkno;
+	pgstat_send(&msg, sizeof(msg));
+}
+
 /* --------
  * pgstat_report_analyze() -
  *
@@ -4594,6 +4617,11 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_vacuum(&msg.msg_vacuum, len);
 					break;
 
+				case PGSTAT_MTYPE_VACUUMRESUMEBLOCK:
+					pgstat_recv_vacuum_resume_block(&msg.msg_vacuum_resume_block,
+													len);
+					break;
+
 				case PGSTAT_MTYPE_ANALYZE:
 					pgstat_recv_analyze(&msg.msg_analyze, len);
 					break;
@@ -6230,6 +6258,20 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 	}
 }
 
+static void
+pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg, int len)
+{
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry *tabentry;
+
+	/*
+	 * Store the data in the table's hashtable entry.
+	 */
+	dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+	tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
+	tabentry->vacuum_resume_block = msg->m_blkno;
+}
+
 /* ----------
  * pgstat_recv_analyze() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..9a9872caaf 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -195,6 +195,20 @@ pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+Datum
+pg_stat_get_vacuum_resume_block(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	uint32		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = (int64) (tabentry->vacuum_resume_block);
+
+	PG_RETURN_INT64(result);
+}
 
 Datum
 pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 0902dce5f1..f25be17944 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5090,6 +5090,11 @@
   proname => 'pg_stat_get_mod_since_analyze', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_mod_since_analyze' },
+{ oid => '4219',
+  descr => 'statistics: block number to resume vacuuming',
+  proname => 'pg_stat_get_vacuum_resume_block', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vacuum_resume_block' },
 { oid => '1934', descr => 'statistics: number of blocks fetched',
   proname => 'pg_stat_get_blocks_fetched', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 128f7ae65d..c3006c52a3 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -145,7 +145,8 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_SKIP_LOCKED = 1 << 5,	/* skip if cannot get lock */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_RESUME = 1 << 8		/* resume from the previous point */
 } VacuumOption;
 
 /*
@@ -184,6 +185,8 @@ typedef struct VacuumParams
 										 * default value depends on reloptions */
 	VacOptTernaryValue truncate;	/* Truncate empty pages at the end,
 									 * default value depends on reloptions */
+	VacOptTernaryValue resume;		/* Resume vacuuming from the last vacuumed
+									 * block */
 } VacuumParams;
 
 /* GUC parameters */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a188..5c662039e4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -57,6 +57,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_RESETSINGLECOUNTER,
 	PGSTAT_MTYPE_AUTOVAC_START,
 	PGSTAT_MTYPE_VACUUM,
+	PGSTAT_MTYPE_VACUUMRESUMEBLOCK,
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
@@ -372,6 +373,14 @@ typedef struct PgStat_MsgVacuum
 	PgStat_Counter m_dead_tuples;
 } PgStat_MsgVacuum;
 
+typedef struct PgStat_MsgVacuumResumeBlock
+{
+	PgStat_MsgHdr m_hdr;
+	Oid			m_databaseid;
+	Oid			m_tableoid;
+	BlockNumber	m_blkno;
+} PgStat_MsgVacuumResumeBlock;
+
 
 /* ----------
  * PgStat_MsgAnalyze			Sent by the backend or autovacuum daemon
@@ -562,6 +571,7 @@ typedef union PgStat_Msg
 	PgStat_MsgResetsinglecounter msg_resetsinglecounter;
 	PgStat_MsgAutovacStart msg_autovacuum_start;
 	PgStat_MsgVacuum msg_vacuum;
+	PgStat_MsgVacuumResumeBlock msg_vacuum_resume_block;
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
@@ -651,6 +661,8 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
 
+	BlockNumber	vacuum_resume_block;
+
 	TimestampTz vacuum_timestamp;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
 	TimestampTz autovac_vacuum_timestamp;	/* autovacuum initiated */
@@ -1263,6 +1275,8 @@ extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type t
 extern void pgstat_report_autovac(Oid dboid);
 extern void pgstat_report_vacuum(Oid tableoid, bool shared,
 								 PgStat_Counter livetuples, PgStat_Counter deadtuples);
+extern void pgstat_report_vacuum_resume_block(Oid tableoid, bool shared,
+											  BlockNumber blkno);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d35b4a5061..3dda8aa019 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -270,6 +270,8 @@ typedef struct StdRdOptions
 	int			parallel_workers;	/* max number of parallel workers */
 	bool		vacuum_index_cleanup;	/* enables index vacuuming and cleanup */
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
+	bool		vacuum_resume;		/* enables vacuum to resuming from last
+									 * vacuumed block. */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..1edd97fdc6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1771,6 +1771,7 @@ pg_stat_all_tables| SELECT c.oid AS relid,
     pg_stat_get_live_tuples(c.oid) AS n_live_tup,
     pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
     pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
+    pg_stat_get_vacuum_resume_block(c.oid) AS vacuum_resume_blk,
     pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
     pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
     pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
@@ -2000,6 +2001,7 @@ pg_stat_sys_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
@@ -2043,6 +2045,7 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 02c53e3058..a0033e23aa 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -148,6 +148,25 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -213,6 +232,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
 CREATE TABLE vacowned_parted (a int) PARTITION BY LIST (a);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 6ffb495546..dd2606a7dc 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -121,6 +121,26 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
 
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
+
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -176,6 +196,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
-- 
2.22.0

Rafia Sabih

rafia.pghackers@gmail.com

over 6 years ago

In reply to: Masahiko Sawada (#2)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Tue, 16 Jul 2019 at 13:57, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jun 12, 2019 at 1:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

Long-running vacuum could be sometimes cancelled by administrator. And
autovacuums could be cancelled by concurrent processes. Even if it
retries after cancellation, since it always restart from the first
block of table it could vacuums blocks again that we vacuumed last
time. We have visibility map to skip scanning all-visible blocks but
in case where the table is large and often modified, we're more likely
to reclaim more garbage from blocks other than we processed last time
than scanning from the first block.

So I'd like to propose to make vacuums save its progress and resume
vacuuming based on it. The mechanism I'm thinking is simple; vacuums
periodically report the current block number to the stats collector.
If table has indexes, reports it after heap vacuum whereas reports it
every certain amount of blocks (e.g. 1024 blocks = 8MB) if no indexes.
We can see that value on new column vacuum_resume_block of
pg_stat_all_tables. I'm going to add one vacuum command option RESUME
and one new reloption vacuum_resume. If the option is true vacuums
fetch the block number from stats collector before starting and start
vacuuming from that block. I wonder if we could make it true by
default for autovacuums but it must be false when aggressive vacuum.

If we start to vacuum from not first block, we can update neither
relfrozenxid nor relfrozenxmxid. And we might not be able to update
even relation statistics.

Sounds like an interesting idea, but does it really help? Because if
vacuum was interrupted previously, wouldn't it already know the dead
tuples, etc in the next run quite quickly, as the VM, FSM is already
updated for the page in the previous run.

A few minor things I noticed in the first look,
+/*
+ * When a table has no indexes, save the progress every 8GB so that we can
+ * resume vacuum from the middle of table. When table has indexes we save it
+ * after the second heap pass finished.
+ */
+#define VACUUM_RESUME_BLK_INTERVAL 1024 /* 8MB */
Discrepancy with the memory unit here.

/* No found valid saved block number, resume from the first block */
Can be better framed.

--
Regards,
Rafia Sabih

Masahiko Sawada

sawada.mshk@gmail.com

over 6 years ago

In reply to: Rafia Sabih (#3)

1 attachment(s)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Thu, Aug 8, 2019 at 10:42 PM Rafia Sabih <rafia.pghackers@gmail.com> wrote:

On Tue, 16 Jul 2019 at 13:57, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jun 12, 2019 at 1:30 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Hi all,

Long-running vacuum could be sometimes cancelled by administrator. And
autovacuums could be cancelled by concurrent processes. Even if it
retries after cancellation, since it always restart from the first
block of table it could vacuums blocks again that we vacuumed last
time. We have visibility map to skip scanning all-visible blocks but
in case where the table is large and often modified, we're more likely
to reclaim more garbage from blocks other than we processed last time
than scanning from the first block.

So I'd like to propose to make vacuums save its progress and resume
vacuuming based on it. The mechanism I'm thinking is simple; vacuums
periodically report the current block number to the stats collector.
If table has indexes, reports it after heap vacuum whereas reports it
every certain amount of blocks (e.g. 1024 blocks = 8MB) if no indexes.
We can see that value on new column vacuum_resume_block of
pg_stat_all_tables. I'm going to add one vacuum command option RESUME
and one new reloption vacuum_resume. If the option is true vacuums
fetch the block number from stats collector before starting and start
vacuuming from that block. I wonder if we could make it true by
default for autovacuums but it must be false when aggressive vacuum.

If we start to vacuum from not first block, we can update neither
relfrozenxid nor relfrozenxmxid. And we might not be able to update
even relation statistics.

Sounds like an interesting idea, but does it really help? Because if
vacuum was interrupted previously, wouldn't it already know the dead
tuples, etc in the next run quite quickly, as the VM, FSM is already
updated for the page in the previous run.

Since tables are modified even during vacuum, if vacuum runs again
after interruption it could need to vacuum the part of table again
that has already been cleaned by the last vacuum. But the rest part of
the table is likely to have more garbage in many cases. Therefore I
think this would be helpful especially for a case where table is large
and heavily updated. Even if the table has not gotten dirtied since
the last vacuum it can skip already-vacuumed pages by looking vm or
the last vacuumed block. I think that it doesn't make thing worse than
today's vacuum in many cases.

A few minor things I noticed in the first look,

Thanks for reviewing the patch.

+/*
+ * When a table has no indexes, save the progress every 8GB so that we can
+ * resume vacuum from the middle of table. When table has indexes we save it
+ * after the second heap pass finished.
+ */
+#define VACUUM_RESUME_BLK_INTERVAL 1024 /* 8MB */
Discrepancy with the memory unit here.

Fixed.

/* No found valid saved block number, resume from the first block */
Can be better framed.

Fixed.

Attached the updated version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v2-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchapplication/x-patch; name=v2-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchDownload

From 4b38130a7ab369eceeb12f3b4153f47a83d96e23 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 10 Jul 2019 19:02:56 +0900
Subject: [PATCH v2] Add RESUME option to VACUUM and autovacuum.

This commit adds a new reloption, vaucum_resume, which controls
whether vacuum attempt to resume vacuuming from the last vacuumed
block saved at vacuum_resume_block column of pg_stat_all_tables. It
also adds a new option to the VACUUM command, RESUME which can be used
to override the reloption.
---
 doc/src/sgml/monitoring.sgml           |  5 ++
 doc/src/sgml/ref/vacuum.sgml           | 18 +++++
 src/backend/access/common/reloptions.c | 13 +++-
 src/backend/access/heap/vacuumlazy.c   | 95 ++++++++++++++++++++++++--
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/commands/vacuum.c          | 13 ++++
 src/backend/postmaster/pgstat.c        | 42 ++++++++++++
 src/backend/utils/adt/pgstatfuncs.c    | 14 ++++
 src/include/catalog/pg_proc.dat        |  5 ++
 src/include/commands/vacuum.h          |  5 +-
 src/include/pgstat.h                   | 14 ++++
 src/include/utils/rel.h                |  2 +
 src/test/regress/expected/rules.out    |  3 +
 src/test/regress/expected/vacuum.out   | 20 ++++++
 src/test/regress/sql/vacuum.sql        | 21 ++++++
 15 files changed, 262 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c303..fe68113b02 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2784,6 +2784,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
      <entry><type>bigint</type></entry>
      <entry>Estimated number of rows modified since this table was last analyzed</entry>
     </row>
+    <row>
+     <entry><structfield>vacuum_resume_block</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Block number to resume vacuuming from</entry>
+    </row>
     <row>
      <entry><structfield>last_vacuum</structfield></entry>
      <entry><type>timestamp with time zone</type></entry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index f9b0fb8794..0b8733d555 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
     INDEX_CLEANUP [ <replaceable class="parameter">boolean</replaceable> ]
     TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
+    RESUME [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -223,6 +224,23 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>RESUME</literal></term>
+    <listitem>
+     <para>
+      Specifies that <command>VACUUM</command> should attempt to
+      resume vacuuming from the last vacuumed block saved at
+      <literal>vacuum_resume_block</literal> column of
+      <xref linkend="pg-stat-all-tables-view"/>. This behavior is helpful
+      when to resume vacuuming from interruption and cancellation.The default
+      is false unless the <literal>vacuum_resume</literal> option has been
+      set to true. This option is ignored if either the <literal>FULL</literal>,
+      the <literal>FREEZE</literal> or <literal>DISABLE_PAGE_SKIPPING</literal>
+      option is used.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 5773021499..92b2d83ded 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -158,6 +158,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"vacuum_resume",
+			"Enables vacuum to resume vacuuming from the last vacuumed block",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		false
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1412,7 +1421,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"vacuum_index_cleanup", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, vacuum_index_cleanup)},
 		{"vacuum_truncate", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, vacuum_truncate)}
+		offsetof(StdRdOptions, vacuum_truncate)},
+		{"vacuum_resume", RELOPT_TYPE_BOOL,
+		offsetof(StdRdOptions, vacuum_resume)}
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1df3b..e90077c4d3 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -91,6 +91,14 @@
 #define VACUUM_FSM_EVERY_PAGES \
 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
 
+/*
+ * When a table has no indexes, save the progress every 8GB so that we can
+ * resume vacuum from the middle of table. When table has indexes we save it
+ * after the second heap pass finished.
+ */
+#define VACUUM_RESUME_BLK_INTERVAL \
+	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
+
 /*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
@@ -175,6 +183,7 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static BlockNumber get_resume_block(Relation onerel);
 
 
 /*
@@ -212,6 +221,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	Assert(params != NULL);
 	Assert(params->index_cleanup != VACOPT_TERNARY_DEFAULT);
 	Assert(params->truncate != VACOPT_TERNARY_DEFAULT);
+	Assert(params->resume != VACOPT_TERNARY_DEFAULT);
 
 	/* not every AM requires these to be valid, but heap does */
 	Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
@@ -504,7 +514,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
 				vacuumed_pages,
-				next_fsm_block_to_vacuum;
+				next_fsm_block_to_vacuum,
+				next_block_to_resume;
 	double		num_tuples,		/* total number of nonremovable tuples */
 				live_tuples,	/* live tuples (reltuples estimate) */
 				tups_vacuumed,	/* tuples cleaned up by vacuum */
@@ -515,6 +526,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber next_unskippable_block;
+	BlockNumber	start_blkno = 0;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
@@ -527,6 +539,19 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	pg_rusage_init(&ru0);
 
+	/*
+	 * If resuming is not requested, we clear the last saved block so as not
+	 * keep the previous information. If requested and it is not an aggressive
+	 * vacuum, we fetch the last saved block number to resume and set it as the
+	 * starting block to vacuum.
+	 */
+	if (params->resume == VACOPT_TERNARY_DISABLED)
+		pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+										  onerel->rd_rel->relisshared,
+										  0);
+	else if (!aggressive)
+		start_blkno = get_resume_block(onerel);
+
 	relname = RelationGetRelationName(onerel);
 	if (aggressive)
 		ereport(elevel,
@@ -534,19 +559,30 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 	else
-		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						relname)));
+	{
+		if (start_blkno != 0)
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\" from %u block",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname, start_blkno)));
+		else
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\"",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname)));
+	}
 
 	empty_pages = vacuumed_pages = 0;
 	next_fsm_block_to_vacuum = (BlockNumber) 0;
+	next_block_to_resume = (BlockNumber) 0;
 	num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
 
 	nblocks = RelationGetNumberOfBlocks(onerel);
+	Assert(start_blkno <= nblocks);	/* both are the same iif it's empty */
+
 	vacrelstats->rel_pages = nblocks;
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->tupcount_pages = 0;
@@ -606,7 +642,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * the last page.  This is worth avoiding mainly because such a lock must
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
-	next_unskippable_block = 0;
+	next_unskippable_block = start_blkno;
 	if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 	{
 		while (next_unskippable_block < nblocks)
@@ -635,7 +671,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
-	for (blkno = 0; blkno < nblocks; blkno++)
+	for (blkno = start_blkno; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
 		Page		page;
@@ -799,6 +835,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 			next_fsm_block_to_vacuum = blkno;
 
+			/* Save the current block number to resume vacuuming */
+			pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+											  onerel->rd_rel->relisshared,
+											  blkno);
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
@@ -1271,6 +1312,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			 */
 			vacrelstats->num_dead_tuples = 0;
 
+			/* Save the current block number to resume vacuuming */
+			if (blkno - next_block_to_resume >= VACUUM_RESUME_BLK_INTERVAL)
+			{
+				pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+												  onerel->rd_rel->relisshared,
+												  blkno);
+				next_block_to_resume = blkno;
+			}
+
 			/*
 			 * Periodically do incremental FSM vacuuming to make newly-freed
 			 * space visible on upper FSM pages.  Note: although we've cleaned
@@ -1458,6 +1508,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	if (blkno > next_fsm_block_to_vacuum)
 		FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 
+	/* Clear the saved block number */
+	pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+									  onerel->rd_rel->relisshared,
+									  0);
+
 	/* report all blocks vacuumed; and that we're cleaning up */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
@@ -2354,3 +2409,29 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number to resume vacuuming from stats collector.
+ */
+static BlockNumber
+get_resume_block(Relation onerel)
+{
+	Oid		oid = RelationGetRelid(onerel);
+	bool	found;
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry	*tabentry;
+
+	dbentry = pgstat_fetch_stat_dbentry(MyDatabaseId);
+	tabentry = hash_search(dbentry->tables, (void *) &oid,
+						   HASH_FIND, &found);
+
+	/*
+	 * If not found a valid saved block number, resume from the
+	 * first block.
+	 */
+	if (!found ||
+		tabentry->vacuum_resume_block >= RelationGetNumberOfBlocks(onerel))
+		return (BlockNumber) 0;
+
+	return tabentry->vacuum_resume_block;
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e395..3e669561df 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -566,6 +566,7 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_live_tuples(C.oid) AS n_live_tup,
             pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
             pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
+            pg_stat_get_vacuum_resume_block(C.oid) AS vacuum_resume_blk,
             pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
             pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
             pg_stat_get_last_analyze_time(C.oid) as last_analyze,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e7b379dfda..4602a96ec4 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -99,6 +99,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	/* Set default value */
 	params.index_cleanup = VACOPT_TERNARY_DEFAULT;
 	params.truncate = VACOPT_TERNARY_DEFAULT;
+	params.resume = VACOPT_TERNARY_DEFAULT;
 
 	/* Parse options list */
 	foreach(lc, vacstmt->options)
@@ -127,6 +128,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			disable_page_skipping = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "index_cleanup") == 0)
 			params.index_cleanup = get_vacopt_ternary_value(opt);
+		else if (strcmp(opt->defname, "resume") == 0)
+			params.resume = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else
@@ -1774,6 +1777,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 			params->truncate = VACOPT_TERNARY_DISABLED;
 	}
 
+	/* Set resume option based on reloptions if not yet, default is false */
+	if (params->resume == VACOPT_TERNARY_DEFAULT)
+	{
+		if (onerel->rd_options == NULL ||
+			!((StdRdOptions *) onerel->rd_options)->vacuum_resume)
+			params->resume = VACOPT_TERNARY_DISABLED;
+		else
+			params->resume = VACOPT_TERNARY_ENABLED;
+	}
+
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast table is
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index b4f2b28b51..1d75240081 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -328,6 +328,8 @@ static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, in
 static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len);
 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
 static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
+static void pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg,
+											int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
@@ -1425,6 +1427,27 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	pgstat_send(&msg, sizeof(msg));
 }
 
+/* ---------
+ * pgstat_report_vacuum_resume_block() -
+ *
+ *	Tell the collector about the block number to resume.
+ * ---------
+ */
+void
+pgstat_report_vacuum_resume_block(Oid tableoid, bool shared, BlockNumber blkno)
+{
+	PgStat_MsgVacuumResumeBlock msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACUUMRESUMEBLOCK);
+	msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
+	msg.m_tableoid = tableoid;
+	msg.m_blkno = blkno;
+	pgstat_send(&msg, sizeof(msg));
+}
+
 /* --------
  * pgstat_report_analyze() -
  *
@@ -4594,6 +4617,11 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_vacuum(&msg.msg_vacuum, len);
 					break;
 
+				case PGSTAT_MTYPE_VACUUMRESUMEBLOCK:
+					pgstat_recv_vacuum_resume_block(&msg.msg_vacuum_resume_block,
+													len);
+					break;
+
 				case PGSTAT_MTYPE_ANALYZE:
 					pgstat_recv_analyze(&msg.msg_analyze, len);
 					break;
@@ -6230,6 +6258,20 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 	}
 }
 
+static void
+pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg, int len)
+{
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry *tabentry;
+
+	/*
+	 * Store the data in the table's hashtable entry.
+	 */
+	dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+	tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
+	tabentry->vacuum_resume_block = msg->m_blkno;
+}
+
 /* ----------
  * pgstat_recv_analyze() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..9a9872caaf 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -195,6 +195,20 @@ pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+Datum
+pg_stat_get_vacuum_resume_block(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	uint32		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = (int64) (tabentry->vacuum_resume_block);
+
+	PG_RETURN_INT64(result);
+}
 
 Datum
 pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 604470cb81..f045462e3e 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5065,6 +5065,11 @@
   proname => 'pg_stat_get_mod_since_analyze', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_mod_since_analyze' },
+{ oid => '4219',
+  descr => 'statistics: block number to resume vacuuming',
+  proname => 'pg_stat_get_vacuum_resume_block', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vacuum_resume_block' },
 { oid => '1934', descr => 'statistics: number of blocks fetched',
   proname => 'pg_stat_get_blocks_fetched', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 128f7ae65d..c3006c52a3 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -145,7 +145,8 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_SKIP_LOCKED = 1 << 5,	/* skip if cannot get lock */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_RESUME = 1 << 8		/* resume from the previous point */
 } VacuumOption;
 
 /*
@@ -184,6 +185,8 @@ typedef struct VacuumParams
 										 * default value depends on reloptions */
 	VacOptTernaryValue truncate;	/* Truncate empty pages at the end,
 									 * default value depends on reloptions */
+	VacOptTernaryValue resume;		/* Resume vacuuming from the last vacuumed
+									 * block */
 } VacuumParams;
 
 /* GUC parameters */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 0a3ad3a188..5c662039e4 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -57,6 +57,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_RESETSINGLECOUNTER,
 	PGSTAT_MTYPE_AUTOVAC_START,
 	PGSTAT_MTYPE_VACUUM,
+	PGSTAT_MTYPE_VACUUMRESUMEBLOCK,
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
@@ -372,6 +373,14 @@ typedef struct PgStat_MsgVacuum
 	PgStat_Counter m_dead_tuples;
 } PgStat_MsgVacuum;
 
+typedef struct PgStat_MsgVacuumResumeBlock
+{
+	PgStat_MsgHdr m_hdr;
+	Oid			m_databaseid;
+	Oid			m_tableoid;
+	BlockNumber	m_blkno;
+} PgStat_MsgVacuumResumeBlock;
+
 
 /* ----------
  * PgStat_MsgAnalyze			Sent by the backend or autovacuum daemon
@@ -562,6 +571,7 @@ typedef union PgStat_Msg
 	PgStat_MsgResetsinglecounter msg_resetsinglecounter;
 	PgStat_MsgAutovacStart msg_autovacuum_start;
 	PgStat_MsgVacuum msg_vacuum;
+	PgStat_MsgVacuumResumeBlock msg_vacuum_resume_block;
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
@@ -651,6 +661,8 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
 
+	BlockNumber	vacuum_resume_block;
+
 	TimestampTz vacuum_timestamp;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
 	TimestampTz autovac_vacuum_timestamp;	/* autovacuum initiated */
@@ -1263,6 +1275,8 @@ extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type t
 extern void pgstat_report_autovac(Oid dboid);
 extern void pgstat_report_vacuum(Oid tableoid, bool shared,
 								 PgStat_Counter livetuples, PgStat_Counter deadtuples);
+extern void pgstat_report_vacuum_resume_block(Oid tableoid, bool shared,
+											  BlockNumber blkno);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d35b4a5061..3dda8aa019 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -270,6 +270,8 @@ typedef struct StdRdOptions
 	int			parallel_workers;	/* max number of parallel workers */
 	bool		vacuum_index_cleanup;	/* enables index vacuuming and cleanup */
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
+	bool		vacuum_resume;		/* enables vacuum to resuming from last
+									 * vacuumed block. */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..1edd97fdc6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1771,6 +1771,7 @@ pg_stat_all_tables| SELECT c.oid AS relid,
     pg_stat_get_live_tuples(c.oid) AS n_live_tup,
     pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
     pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
+    pg_stat_get_vacuum_resume_block(c.oid) AS vacuum_resume_blk,
     pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
     pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
     pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
@@ -2000,6 +2001,7 @@ pg_stat_sys_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
@@ -2043,6 +2045,7 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 02c53e3058..a0033e23aa 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -148,6 +148,25 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -213,6 +232,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
 CREATE TABLE vacowned_parted (a int) PARTITION BY LIST (a);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 6ffb495546..dd2606a7dc 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -121,6 +121,26 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
 
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
+
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -176,6 +196,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
-- 
2.22.0

Jamison, Kirk

k.jamison@jp.fujitsu.com

over 6 years ago

In reply to: Masahiko Sawada (#4)

RE: Resume vacuum and autovacuum from interruption and cancellation

On Monday, August 19, 2019 10:39 AM (GMT+9), Masahiko Sawada wrote:

Fixed.

Attached the updated version patch.

Hi Sawada-san,

I haven't tested it with heavily updated large tables, but I think the patch
is reasonable as it helps to shorten the execution time of vacuum by removing
the redundant vacuuming and prioritizing reclaiming the garbage instead.
I'm not sure if it's commonly reported to have problems even when we repeat
vacuuming the already-vacuumed blocks, but I think it's a reasonable improvement.

I skimmed the patch and have few comments. If they deem fit, feel free to
follow, but it's also ok if you don't.
1.

+ <entry>Block number to resume vacuuming from</entry>

Perhaps you could drop "from".

+      <xref linkend="pg-stat-all-tables-view"/>. This behavior is helpful
+      when to resume vacuuming from interruption and cancellation.The default

when resuming vacuum run from interruption and cancellation.
There should also be space between period and "The".

+      set to true. This option is ignored if either the <literal>FULL</literal>,
+      the <literal>FREEZE</literal> or <literal>DISABLE_PAGE_SKIPPING</literal>
+      option is used.

..if either of the <literal>FULL</literal>, <literal>FREEZE</literal>, or <literal>DISABLE_PAGE_SKIPPING</literal> options is used.

+ next_fsm_block_to_vacuum,
+ next_block_to_resume;

Clearer one would be "next_block_to_resume_vacuum"?
You may disregard if that's too long.

+ Assert(start_blkno <= nblocks); /* both are the same iif it's empty */

iif -> if there are no blocks / if nblocks is 0

+	 * If not found a valid saved block number, resume from the
+	 * first block.
+	 */
+	if (!found ||
+		tabentry->vacuum_resume_block >= RelationGetNumberOfBlocks(onerel))

This describes when vacuum_resume_block > RelationGetNumberOfBlocks.., isn't it?
Perhaps a better framing would be
"If the saved block number is found invalid,...",

+	bool		vacuum_resume;		/* enables vacuum to resuming from last
+									 * vacuumed block. */

resuming --> resume

Regards,
Kirk Jamison

Masahiko Sawada

sawada.mshk@gmail.com

over 6 years ago

In reply to: Jamison, Kirk (#5)

1 attachment(s)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Tue, Aug 27, 2019 at 2:55 PM Jamison, Kirk <k.jamison@jp.fujitsu.com> wrote:

On Monday, August 19, 2019 10:39 AM (GMT+9), Masahiko Sawada wrote:

Fixed.

Attached the updated version patch.

Hi Sawada-san,

I haven't tested it with heavily updated large tables, but I think the patch
is reasonable as it helps to shorten the execution time of vacuum by removing
the redundant vacuuming and prioritizing reclaiming the garbage instead.
I'm not sure if it's commonly reported to have problems even when we repeat
vacuuming the already-vacuumed blocks, but I think it's a reasonable improvement.

I skimmed the patch and have few comments. If they deem fit, feel free to
follow, but it's also ok if you don't.

Thank you for reviewing this patch! I've attached the updated patch
incorporated all your comments and some improvements.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

v3-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchapplication/x-patch; name=v3-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchDownload

From 0bd1cf64aa55977029371e3402d05240dfdfec21 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 10 Jul 2019 19:02:56 +0900
Subject: [PATCH v3] Add RESUME option to VACUUM and autovacuum.

This commit adds a new reloption, vaucum_resume, which controls
whether vacuum attempt to resume vacuuming from the last vacuumed
block saved at vacuum_resume_block column of pg_stat_all_tables. It
also adds a new option to the VACUUM command, RESUME which can be used
to override the reloption.
---
 doc/src/sgml/monitoring.sgml           |  5 ++
 doc/src/sgml/ref/vacuum.sgml           | 18 +++++++
 src/backend/access/common/reloptions.c | 13 ++++-
 src/backend/access/heap/vacuumlazy.c   | 88 +++++++++++++++++++++++++++++++---
 src/backend/catalog/system_views.sql   |  1 +
 src/backend/commands/vacuum.c          | 13 +++++
 src/backend/postmaster/pgstat.c        | 42 ++++++++++++++++
 src/backend/utils/adt/pgstatfuncs.c    | 14 ++++++
 src/include/catalog/pg_proc.dat        |  5 ++
 src/include/commands/vacuum.h          |  5 +-
 src/include/pgstat.h                   | 14 ++++++
 src/include/utils/rel.h                |  2 +
 src/test/regress/expected/rules.out    |  3 ++
 src/test/regress/expected/vacuum.out   | 20 ++++++++
 src/test/regress/sql/vacuum.sql        | 21 ++++++++
 15 files changed, 255 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index bf72d0c..dfd8669 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2785,6 +2785,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
      <entry>Estimated number of rows modified since this table was last analyzed</entry>
     </row>
     <row>
+     <entry><structfield>vacuum_resume_block</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Block number to resume vacuuming</entry>
+    </row>
+    <row>
      <entry><structfield>last_vacuum</structfield></entry>
      <entry><type>timestamp with time zone</type></entry>
      <entry>Last time at which this table was manually vacuumed
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index f9b0fb8..ae3798b 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
     INDEX_CLEANUP [ <replaceable class="parameter">boolean</replaceable> ]
     TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
+    RESUME [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -224,6 +225,23 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
    </varlistentry>
 
    <varlistentry>
+    <term><literal>RESUME</literal></term>
+    <listitem>
+     <para>
+      Specifies that <command>VACUUM</command> should attempt to
+      resume vacuuming from the last vacuumed block saved at
+      <literal>vacuum_resume_block</literal> column of
+      <xref linkend="pg-stat-all-tables-view"/>. This behavior is helpful
+      when resuming vacuum run from interruption and cancellation. The default
+      is false unless the <literal>vacuum_resume</literal> option has been
+      set to true. This option is ignored if either the <literal>FULL</literal>,
+      <literal>FREEZE</literal>, or <literal>DISABLE_PAGE_SKIPPING</literal>
+      option is used.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
      <para>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 42647b0..0200eb0 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -158,6 +158,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"vacuum_resume",
+			"Enables vacuum to resume vacuuming from the last vacuumed block",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		false
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1412,7 +1421,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"vacuum_index_cleanup", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, vacuum_index_cleanup)},
 		{"vacuum_truncate", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, vacuum_truncate)}
+		offsetof(StdRdOptions, vacuum_truncate)},
+		{"vacuum_resume", RELOPT_TYPE_BOOL,
+		offsetof(StdRdOptions, vacuum_resume)}
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1d..1895ca0 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -92,6 +92,14 @@
 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
 
 /*
+ * When a table has no indexes, save the progress every 8GB so that we can
+ * resume vacuum from the middle of table. When table has indexes we save it
+ * after the second heap pass finished.
+ */
+#define VACUUM_RESUME_BLK_INTERVAL \
+	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
+
+/*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
  * tables.
@@ -175,6 +183,7 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static BlockNumber get_resume_block(Relation onerel);
 
 
 /*
@@ -212,6 +221,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	Assert(params != NULL);
 	Assert(params->index_cleanup != VACOPT_TERNARY_DEFAULT);
 	Assert(params->truncate != VACOPT_TERNARY_DEFAULT);
+	Assert(params->resume != VACOPT_TERNARY_DEFAULT);
 
 	/* not every AM requires these to be valid, but heap does */
 	Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
@@ -504,7 +514,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
 				vacuumed_pages,
-				next_fsm_block_to_vacuum;
+				next_fsm_block_to_vacuum,
+				next_block_to_resume_vacuum;
 	double		num_tuples,		/* total number of nonremovable tuples */
 				live_tuples,	/* live tuples (reltuples estimate) */
 				tups_vacuumed,	/* tuples cleaned up by vacuum */
@@ -515,6 +526,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber next_unskippable_block;
+	BlockNumber	start_blkno = 0;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
@@ -527,6 +539,19 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	pg_rusage_init(&ru0);
 
+	/*
+	 * If resuming is not requested, we clear the last saved block so as not
+	 * keep the previous information. If requested and it is not an aggressive
+	 * vacuum, we fetch the last saved block number to resume and set it as the
+	 * starting block to vacuum.
+	 */
+	if (params->resume == VACOPT_TERNARY_DISABLED)
+		pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+										  onerel->rd_rel->relisshared,
+										  0);
+	else if (!aggressive)
+		start_blkno = get_resume_block(onerel);
+
 	relname = RelationGetRelationName(onerel);
 	if (aggressive)
 		ereport(elevel,
@@ -534,19 +559,30 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 	else
-		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						relname)));
+	{
+		if (start_blkno != 0)
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\" from %u block",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname, start_blkno)));
+		else
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\"",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname)));
+	}
 
 	empty_pages = vacuumed_pages = 0;
 	next_fsm_block_to_vacuum = (BlockNumber) 0;
+	next_block_to_resume_vacuum = (BlockNumber) 0;
 	num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
 
 	nblocks = RelationGetNumberOfBlocks(onerel);
+	Assert(start_blkno <= nblocks);	/* both are the same if there are no blocks */
+
 	vacrelstats->rel_pages = nblocks;
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->tupcount_pages = 0;
@@ -606,7 +642,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * the last page.  This is worth avoiding mainly because such a lock must
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
-	next_unskippable_block = 0;
+	next_unskippable_block = start_blkno;
 	if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 	{
 		while (next_unskippable_block < nblocks)
@@ -635,7 +671,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
-	for (blkno = 0; blkno < nblocks; blkno++)
+	for (blkno = start_blkno; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
 		Page		page;
@@ -799,6 +835,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 			next_fsm_block_to_vacuum = blkno;
 
+			/* Save the current block number to resume vacuuming */
+			pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+											  onerel->rd_rel->relisshared,
+											  blkno);
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
@@ -1271,6 +1312,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			 */
 			vacrelstats->num_dead_tuples = 0;
 
+			/* Save the current block number to resume vacuuming */
+			if (blkno - next_block_to_resume_vacuum >= VACUUM_RESUME_BLK_INTERVAL)
+			{
+				pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+												  onerel->rd_rel->relisshared,
+												  blkno);
+				next_block_to_resume_vacuum = blkno;
+			}
+
 			/*
 			 * Periodically do incremental FSM vacuuming to make newly-freed
 			 * space visible on upper FSM pages.  Note: although we've cleaned
@@ -1458,6 +1508,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	if (blkno > next_fsm_block_to_vacuum)
 		FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 
+	/* Clear the saved block number */
+	pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+									  onerel->rd_rel->relisshared,
+									  0);
+
 	/* report all blocks vacuumed; and that we're cleaning up */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
@@ -2354,3 +2409,22 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number to resume vacuuming fetched from stats collector.
+ */
+static BlockNumber
+get_resume_block(Relation onerel)
+{
+	PgStat_StatTabEntry	*tabentry;
+
+	tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
+
+	/* If not found a valid saved block number, resume from the first block */
+	if (tabentry == NULL ||
+		tabentry->vacuum_resume_block >= RelationGetNumberOfBlocks(onerel))
+		return (BlockNumber) 0;
+
+	Assert(tabentry->vacuum_resume_block >= 0);
+	return tabentry->vacuum_resume_block;
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index ea4c85e..3e66956 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -566,6 +566,7 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_live_tuples(C.oid) AS n_live_tup,
             pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
             pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
+            pg_stat_get_vacuum_resume_block(C.oid) AS vacuum_resume_blk,
             pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
             pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
             pg_stat_get_last_analyze_time(C.oid) as last_analyze,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index e154507..f35c132 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -99,6 +99,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	/* Set default value */
 	params.index_cleanup = VACOPT_TERNARY_DEFAULT;
 	params.truncate = VACOPT_TERNARY_DEFAULT;
+	params.resume = VACOPT_TERNARY_DEFAULT;
 
 	/* Parse options list */
 	foreach(lc, vacstmt->options)
@@ -127,6 +128,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			disable_page_skipping = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "index_cleanup") == 0)
 			params.index_cleanup = get_vacopt_ternary_value(opt);
+		else if (strcmp(opt->defname, "resume") == 0)
+			params.resume = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else
@@ -1783,6 +1786,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 			params->truncate = VACOPT_TERNARY_DISABLED;
 	}
 
+	/* Set resume option based on reloptions if not yet, default is false */
+	if (params->resume == VACOPT_TERNARY_DEFAULT)
+	{
+		if (onerel->rd_options == NULL ||
+			!((StdRdOptions *) onerel->rd_options)->vacuum_resume)
+			params->resume = VACOPT_TERNARY_DISABLED;
+		else
+			params->resume = VACOPT_TERNARY_ENABLED;
+	}
+
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast table is
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f..5d690dc 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -328,6 +328,8 @@ static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, in
 static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len);
 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
 static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
+static void pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg,
+											int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
@@ -1425,6 +1427,27 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	pgstat_send(&msg, sizeof(msg));
 }
 
+/* ---------
+ * pgstat_report_vacuum_resume_block() -
+ *
+ *	Tell the collector about the block number to resume.
+ * ---------
+ */
+void
+pgstat_report_vacuum_resume_block(Oid tableoid, bool shared, BlockNumber blkno)
+{
+	PgStat_MsgVacuumResumeBlock msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACUUMRESUMEBLOCK);
+	msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
+	msg.m_tableoid = tableoid;
+	msg.m_blkno = blkno;
+	pgstat_send(&msg, sizeof(msg));
+}
+
 /* --------
  * pgstat_report_analyze() -
  *
@@ -4594,6 +4617,11 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_vacuum(&msg.msg_vacuum, len);
 					break;
 
+				case PGSTAT_MTYPE_VACUUMRESUMEBLOCK:
+					pgstat_recv_vacuum_resume_block(&msg.msg_vacuum_resume_block,
+													len);
+					break;
+
 				case PGSTAT_MTYPE_ANALYZE:
 					pgstat_recv_analyze(&msg.msg_analyze, len);
 					break;
@@ -6230,6 +6258,20 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 	}
 }
 
+static void
+pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg, int len)
+{
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry *tabentry;
+
+	/*
+	 * Store the data in the table's hashtable entry.
+	 */
+	dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+	tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
+	tabentry->vacuum_resume_block = msg->m_blkno;
+}
+
 /* ----------
  * pgstat_recv_analyze() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bf..961c598 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -195,6 +195,20 @@ pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+Datum
+pg_stat_get_vacuum_resume_block(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	uint64		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = (int64) (tabentry->vacuum_resume_block);
+
+	PG_RETURN_INT64(result);
+}
 
 Datum
 pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index cf1f409..126ade9 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5093,6 +5093,11 @@
   proname => 'pg_stat_get_mod_since_analyze', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_mod_since_analyze' },
+{ oid => '4219',
+  descr => 'statistics: block number to resume vacuuming',
+  proname => 'pg_stat_get_vacuum_resume_block', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vacuum_resume_block' },
 { oid => '1934', descr => 'statistics: number of blocks fetched',
   proname => 'pg_stat_get_blocks_fetched', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 128f7ae..c3006c5 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -145,7 +145,8 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_SKIP_LOCKED = 1 << 5,	/* skip if cannot get lock */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_RESUME = 1 << 8		/* resume from the previous point */
 } VacuumOption;
 
 /*
@@ -184,6 +185,8 @@ typedef struct VacuumParams
 										 * default value depends on reloptions */
 	VacOptTernaryValue truncate;	/* Truncate empty pages at the end,
 									 * default value depends on reloptions */
+	VacOptTernaryValue resume;		/* Resume vacuuming from the last vacuumed
+									 * block */
 } VacuumParams;
 
 /* GUC parameters */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d8..00604b5 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -56,6 +56,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_RESETSINGLECOUNTER,
 	PGSTAT_MTYPE_AUTOVAC_START,
 	PGSTAT_MTYPE_VACUUM,
+	PGSTAT_MTYPE_VACUUMRESUMEBLOCK,
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
@@ -371,6 +372,14 @@ typedef struct PgStat_MsgVacuum
 	PgStat_Counter m_dead_tuples;
 } PgStat_MsgVacuum;
 
+typedef struct PgStat_MsgVacuumResumeBlock
+{
+	PgStat_MsgHdr m_hdr;
+	Oid			m_databaseid;
+	Oid			m_tableoid;
+	BlockNumber	m_blkno;
+} PgStat_MsgVacuumResumeBlock;
+
 
 /* ----------
  * PgStat_MsgAnalyze			Sent by the backend or autovacuum daemon
@@ -561,6 +570,7 @@ typedef union PgStat_Msg
 	PgStat_MsgResetsinglecounter msg_resetsinglecounter;
 	PgStat_MsgAutovacStart msg_autovacuum_start;
 	PgStat_MsgVacuum msg_vacuum;
+	PgStat_MsgVacuumResumeBlock msg_vacuum_resume_block;
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
@@ -650,6 +660,8 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
 
+	BlockNumber	vacuum_resume_block;
+
 	TimestampTz vacuum_timestamp;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
 	TimestampTz autovac_vacuum_timestamp;	/* autovacuum initiated */
@@ -1262,6 +1274,8 @@ extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type t
 extern void pgstat_report_autovac(Oid dboid);
 extern void pgstat_report_vacuum(Oid tableoid, bool shared,
 								 PgStat_Counter livetuples, PgStat_Counter deadtuples);
+extern void pgstat_report_vacuum_resume_block(Oid tableoid, bool shared,
+											  BlockNumber blkno);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 91b3b1b..7e65586 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -272,6 +272,8 @@ typedef struct StdRdOptions
 	int			parallel_workers;	/* max number of parallel workers */
 	bool		vacuum_index_cleanup;	/* enables index vacuuming and cleanup */
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
+	bool		vacuum_resume;		/* enables vacuum to resume from last
+									 * vacuumed block. */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd..1edd97f 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1771,6 +1771,7 @@ pg_stat_all_tables| SELECT c.oid AS relid,
     pg_stat_get_live_tuples(c.oid) AS n_live_tup,
     pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
     pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
+    pg_stat_get_vacuum_resume_block(c.oid) AS vacuum_resume_blk,
     pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
     pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
     pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
@@ -2000,6 +2001,7 @@ pg_stat_sys_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
@@ -2043,6 +2045,7 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index aff0b10..d2f5de8 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -148,6 +148,25 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -217,6 +236,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
 CREATE TABLE vacowned_parted (a int) PARTITION BY LIST (a);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index f0fee3a..a1f2ad8 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -121,6 +121,26 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
 
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
+
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -180,6 +200,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
-- 
2.10.5

Alvaro Herrera

alvherre@2ndquadrant.com

over 6 years ago

In reply to: Masahiko Sawada (#6)

Re: Resume vacuum and autovacuum from interruption and cancellation

Apparently this patch now has a duplicate OID. Please do use random
OIDs >8000 as suggested by the unused_oids script.

--
ï¿½lvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Ibrar Ahmed

ibrar.ahmad@gmail.com

about 6 years ago

In reply to: Alvaro Herrera (#7)

1 attachment(s)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Thu, Sep 26, 2019 at 1:53 AM Alvaro Herrera <alvherre@2ndquadrant.com>
wrote:

Apparently this patch now has a duplicate OID. Please do use random
OIDs >8000 as suggested by the unused_oids script.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

I have updated the patch using OIDs > 8000

--
Ibrar Ahmed

Attachments:

v4-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchapplication/octet-stream; name=v4-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e6c49eebad..1aacaa94a8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2784,6 +2784,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
      <entry><type>bigint</type></entry>
      <entry>Estimated number of rows modified since this table was last analyzed</entry>
     </row>
+    <row>
+     <entry><structfield>vacuum_resume_block</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Block number to resume vacuuming</entry>
+    </row>
     <row>
      <entry><structfield>last_vacuum</structfield></entry>
      <entry><type>timestamp with time zone</type></entry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index f9b0fb8794..ae3798b44f 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -34,6 +34,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     SKIP_LOCKED [ <replaceable class="parameter">boolean</replaceable> ]
     INDEX_CLEANUP [ <replaceable class="parameter">boolean</replaceable> ]
     TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
+    RESUME [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -223,6 +224,23 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>RESUME</literal></term>
+    <listitem>
+     <para>
+      Specifies that <command>VACUUM</command> should attempt to
+      resume vacuuming from the last vacuumed block saved at
+      <literal>vacuum_resume_block</literal> column of
+      <xref linkend="pg-stat-all-tables-view"/>. This behavior is helpful
+      when resuming vacuum run from interruption and cancellation. The default
+      is false unless the <literal>vacuum_resume</literal> option has been
+      set to true. This option is ignored if either the <literal>FULL</literal>,
+      <literal>FREEZE</literal>, or <literal>DISABLE_PAGE_SKIPPING</literal>
+      option is used.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index b5072c00fe..92f6ae6784 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -158,6 +158,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"vacuum_resume",
+			"Enables vacuum to resume vacuuming from the last vacuumed block",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		false
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1518,7 +1527,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"vacuum_index_cleanup", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, vacuum_index_cleanup)},
 		{"vacuum_truncate", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, vacuum_truncate)}
+		offsetof(StdRdOptions, vacuum_truncate)},
+		{"vacuum_resume", RELOPT_TYPE_BOOL,
+		offsetof(StdRdOptions, vacuum_resume)}
 	};
 
 	options = parseRelOptions(reloptions, validate, kind, &numoptions);
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index a3c4a1df3b..1895ca0c5b 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -91,6 +91,14 @@
 #define VACUUM_FSM_EVERY_PAGES \
 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
 
+/*
+ * When a table has no indexes, save the progress every 8GB so that we can
+ * resume vacuum from the middle of table. When table has indexes we save it
+ * after the second heap pass finished.
+ */
+#define VACUUM_RESUME_BLK_INTERVAL \
+	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
+
 /*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
@@ -175,6 +183,7 @@ static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 									 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static BlockNumber get_resume_block(Relation onerel);
 
 
 /*
@@ -212,6 +221,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	Assert(params != NULL);
 	Assert(params->index_cleanup != VACOPT_TERNARY_DEFAULT);
 	Assert(params->truncate != VACOPT_TERNARY_DEFAULT);
+	Assert(params->resume != VACOPT_TERNARY_DEFAULT);
 
 	/* not every AM requires these to be valid, but heap does */
 	Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
@@ -504,7 +514,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
 				vacuumed_pages,
-				next_fsm_block_to_vacuum;
+				next_fsm_block_to_vacuum,
+				next_block_to_resume_vacuum;
 	double		num_tuples,		/* total number of nonremovable tuples */
 				live_tuples,	/* live tuples (reltuples estimate) */
 				tups_vacuumed,	/* tuples cleaned up by vacuum */
@@ -515,6 +526,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber next_unskippable_block;
+	BlockNumber	start_blkno = 0;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
@@ -527,6 +539,19 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	pg_rusage_init(&ru0);
 
+	/*
+	 * If resuming is not requested, we clear the last saved block so as not
+	 * keep the previous information. If requested and it is not an aggressive
+	 * vacuum, we fetch the last saved block number to resume and set it as the
+	 * starting block to vacuum.
+	 */
+	if (params->resume == VACOPT_TERNARY_DISABLED)
+		pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+										  onerel->rd_rel->relisshared,
+										  0);
+	else if (!aggressive)
+		start_blkno = get_resume_block(onerel);
+
 	relname = RelationGetRelationName(onerel);
 	if (aggressive)
 		ereport(elevel,
@@ -534,19 +559,30 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 	else
-		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						relname)));
+	{
+		if (start_blkno != 0)
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\" from %u block",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname, start_blkno)));
+		else
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\"",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname)));
+	}
 
 	empty_pages = vacuumed_pages = 0;
 	next_fsm_block_to_vacuum = (BlockNumber) 0;
+	next_block_to_resume_vacuum = (BlockNumber) 0;
 	num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
 
 	nblocks = RelationGetNumberOfBlocks(onerel);
+	Assert(start_blkno <= nblocks);	/* both are the same if there are no blocks */
+
 	vacrelstats->rel_pages = nblocks;
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->tupcount_pages = 0;
@@ -606,7 +642,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * the last page.  This is worth avoiding mainly because such a lock must
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
-	next_unskippable_block = 0;
+	next_unskippable_block = start_blkno;
 	if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 	{
 		while (next_unskippable_block < nblocks)
@@ -635,7 +671,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
-	for (blkno = 0; blkno < nblocks; blkno++)
+	for (blkno = start_blkno; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
 		Page		page;
@@ -799,6 +835,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 			next_fsm_block_to_vacuum = blkno;
 
+			/* Save the current block number to resume vacuuming */
+			pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+											  onerel->rd_rel->relisshared,
+											  blkno);
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
@@ -1271,6 +1312,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			 */
 			vacrelstats->num_dead_tuples = 0;
 
+			/* Save the current block number to resume vacuuming */
+			if (blkno - next_block_to_resume_vacuum >= VACUUM_RESUME_BLK_INTERVAL)
+			{
+				pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+												  onerel->rd_rel->relisshared,
+												  blkno);
+				next_block_to_resume_vacuum = blkno;
+			}
+
 			/*
 			 * Periodically do incremental FSM vacuuming to make newly-freed
 			 * space visible on upper FSM pages.  Note: although we've cleaned
@@ -1458,6 +1508,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	if (blkno > next_fsm_block_to_vacuum)
 		FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 
+	/* Clear the saved block number */
+	pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+									  onerel->rd_rel->relisshared,
+									  0);
+
 	/* report all blocks vacuumed; and that we're cleaning up */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
@@ -2354,3 +2409,22 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number to resume vacuuming fetched from stats collector.
+ */
+static BlockNumber
+get_resume_block(Relation onerel)
+{
+	PgStat_StatTabEntry	*tabentry;
+
+	tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
+
+	/* If not found a valid saved block number, resume from the first block */
+	if (tabentry == NULL ||
+		tabentry->vacuum_resume_block >= RelationGetNumberOfBlocks(onerel))
+		return (BlockNumber) 0;
+
+	Assert(tabentry->vacuum_resume_block >= 0);
+	return tabentry->vacuum_resume_block;
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 9fe4a4794a..0982296ed5 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -566,6 +566,7 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_live_tuples(C.oid) AS n_live_tup,
             pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
             pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
+            pg_stat_get_vacuum_resume_block(C.oid) AS vacuum_resume_blk,
             pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
             pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
             pg_stat_get_last_analyze_time(C.oid) as last_analyze,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 4b67b40b28..70471a1ee0 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -99,6 +99,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	/* Set default value */
 	params.index_cleanup = VACOPT_TERNARY_DEFAULT;
 	params.truncate = VACOPT_TERNARY_DEFAULT;
+	params.resume = VACOPT_TERNARY_DEFAULT;
 
 	/* Parse options list */
 	foreach(lc, vacstmt->options)
@@ -127,6 +128,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			disable_page_skipping = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "index_cleanup") == 0)
 			params.index_cleanup = get_vacopt_ternary_value(opt);
+		else if (strcmp(opt->defname, "resume") == 0)
+			params.resume = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else
@@ -1789,6 +1792,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 			params->truncate = VACOPT_TERNARY_DISABLED;
 	}
 
+	/* Set resume option based on reloptions if not yet, default is false */
+	if (params->resume == VACOPT_TERNARY_DEFAULT)
+	{
+		if (onerel->rd_options == NULL ||
+			!((StdRdOptions *) onerel->rd_options)->vacuum_resume)
+			params->resume = VACOPT_TERNARY_DISABLED;
+		else
+			params->resume = VACOPT_TERNARY_ENABLED;
+	}
+
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast table is
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 011076c3e3..d45c8ee8b0 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -328,6 +328,8 @@ static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, in
 static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len);
 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
 static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
+static void pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg,
+											int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
@@ -1425,6 +1427,27 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	pgstat_send(&msg, sizeof(msg));
 }
 
+/* ---------
+ * pgstat_report_vacuum_resume_block() -
+ *
+ *	Tell the collector about the block number to resume.
+ * ---------
+ */
+void
+pgstat_report_vacuum_resume_block(Oid tableoid, bool shared, BlockNumber blkno)
+{
+	PgStat_MsgVacuumResumeBlock msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACUUMRESUMEBLOCK);
+	msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
+	msg.m_tableoid = tableoid;
+	msg.m_blkno = blkno;
+	pgstat_send(&msg, sizeof(msg));
+}
+
 /* --------
  * pgstat_report_analyze() -
  *
@@ -4594,6 +4617,11 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_vacuum(&msg.msg_vacuum, len);
 					break;
 
+				case PGSTAT_MTYPE_VACUUMRESUMEBLOCK:
+					pgstat_recv_vacuum_resume_block(&msg.msg_vacuum_resume_block,
+													len);
+					break;
+
 				case PGSTAT_MTYPE_ANALYZE:
 					pgstat_recv_analyze(&msg.msg_analyze, len);
 					break;
@@ -6230,6 +6258,20 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 	}
 }
 
+static void
+pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg, int len)
+{
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry *tabentry;
+
+	/*
+	 * Store the data in the table's hashtable entry.
+	 */
+	dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+	tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
+	tabentry->vacuum_resume_block = msg->m_blkno;
+}
+
 /* ----------
  * pgstat_recv_analyze() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 05240bfd14..961c59869a 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -195,6 +195,20 @@ pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+Datum
+pg_stat_get_vacuum_resume_block(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	uint64		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = (int64) (tabentry->vacuum_resume_block);
+
+	PG_RETURN_INT64(result);
+}
 
 Datum
 pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58ea5b982b..c605c4abb8 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5099,6 +5099,11 @@
   proname => 'pg_stat_get_mod_since_analyze', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_mod_since_analyze' },
+{ oid => '8001',
+  descr => 'statistics: block number to resume vacuuming',
+  proname => 'pg_stat_get_vacuum_resume_block', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vacuum_resume_block' },
 { oid => '1934', descr => 'statistics: number of blocks fetched',
   proname => 'pg_stat_get_blocks_fetched', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 128f7ae65d..c3006c52a3 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -145,7 +145,8 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_SKIP_LOCKED = 1 << 5,	/* skip if cannot get lock */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_RESUME = 1 << 8		/* resume from the previous point */
 } VacuumOption;
 
 /*
@@ -184,6 +185,8 @@ typedef struct VacuumParams
 										 * default value depends on reloptions */
 	VacOptTernaryValue truncate;	/* Truncate empty pages at the end,
 									 * default value depends on reloptions */
+	VacOptTernaryValue resume;		/* Resume vacuuming from the last vacuumed
+									 * block */
 } VacuumParams;
 
 /* GUC parameters */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index fe076d823d..00604b52a1 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -56,6 +56,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_RESETSINGLECOUNTER,
 	PGSTAT_MTYPE_AUTOVAC_START,
 	PGSTAT_MTYPE_VACUUM,
+	PGSTAT_MTYPE_VACUUMRESUMEBLOCK,
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
@@ -371,6 +372,14 @@ typedef struct PgStat_MsgVacuum
 	PgStat_Counter m_dead_tuples;
 } PgStat_MsgVacuum;
 
+typedef struct PgStat_MsgVacuumResumeBlock
+{
+	PgStat_MsgHdr m_hdr;
+	Oid			m_databaseid;
+	Oid			m_tableoid;
+	BlockNumber	m_blkno;
+} PgStat_MsgVacuumResumeBlock;
+
 
 /* ----------
  * PgStat_MsgAnalyze			Sent by the backend or autovacuum daemon
@@ -561,6 +570,7 @@ typedef union PgStat_Msg
 	PgStat_MsgResetsinglecounter msg_resetsinglecounter;
 	PgStat_MsgAutovacStart msg_autovacuum_start;
 	PgStat_MsgVacuum msg_vacuum;
+	PgStat_MsgVacuumResumeBlock msg_vacuum_resume_block;
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
@@ -650,6 +660,8 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
 
+	BlockNumber	vacuum_resume_block;
+
 	TimestampTz vacuum_timestamp;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
 	TimestampTz autovac_vacuum_timestamp;	/* autovacuum initiated */
@@ -1262,6 +1274,8 @@ extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type t
 extern void pgstat_report_autovac(Oid dboid);
 extern void pgstat_report_vacuum(Oid tableoid, bool shared,
 								 PgStat_Counter livetuples, PgStat_Counter deadtuples);
+extern void pgstat_report_vacuum_resume_block(Oid tableoid, bool shared,
+											  BlockNumber blkno);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index a5cf804f9f..cc99596d67 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -272,6 +272,8 @@ typedef struct StdRdOptions
 	int			parallel_workers;	/* max number of parallel workers */
 	bool		vacuum_index_cleanup;	/* enables index vacuuming and cleanup */
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
+	bool		vacuum_resume;		/* enables vacuum to resume from last
+									 * vacuumed block. */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 210e9cd146..1edd97fdc6 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1771,6 +1771,7 @@ pg_stat_all_tables| SELECT c.oid AS relid,
     pg_stat_get_live_tuples(c.oid) AS n_live_tup,
     pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
     pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
+    pg_stat_get_vacuum_resume_block(c.oid) AS vacuum_resume_blk,
     pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
     pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
     pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
@@ -2000,6 +2001,7 @@ pg_stat_sys_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
@@ -2043,6 +2045,7 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index aff0b10a93..d2f5de896f 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -148,6 +148,25 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -217,6 +236,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
 CREATE TABLE vacowned_parted (a int) PARTITION BY LIST (a);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index f0fee3af2b..a1f2ad8354 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -121,6 +121,26 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
 
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
+
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -180,6 +200,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);

Robert Haas

robertmhaas@gmail.com

about 6 years ago

In reply to: Rafia Sabih (#3)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote:

Sounds like an interesting idea, but does it really help? Because if
vacuum was interrupted previously, wouldn't it already know the dead
tuples, etc in the next run quite quickly, as the VM, FSM is already
updated for the page in the previous run.

+1. I don't deny that a patch like this could sometimes save
something, but it doesn't seem like it would save all that much all
that often. If your autovacuum runs are being frequently cancelled,
that's going to be a big problem, I think. And as Rafia says, even
though you might do a little extra work reclaiming garbage from
subsequently-modified pages toward the beginning of the table, it
would be unusual if they'd *all* been modified. Plus, if they've
recently been modified, they're more likely to be in cache.

I think this patch really needs a test scenario or demonstration of
some kind to prove that it produces a measurable benefit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

#10

Masahiko Sawada

masahiko.sawada@2ndquadrant.com

about 6 years ago

In reply to: Robert Haas (#9)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Sat, 2 Nov 2019 at 02:10, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote:

Sounds like an interesting idea, but does it really help? Because if
vacuum was interrupted previously, wouldn't it already know the dead
tuples, etc in the next run quite quickly, as the VM, FSM is already
updated for the page in the previous run.

+1. I don't deny that a patch like this could sometimes save
something, but it doesn't seem like it would save all that much all
that often. If your autovacuum runs are being frequently cancelled,
that's going to be a big problem, I think.

I've observed the case where user wants to cancel a very long running
autovacuum (sometimes for anti-wraparound) for doing DDL or something
maintenance works. If the table is very large autovacuum could take a
long time and they might not reclaim garbage enough.

And as Rafia says, even
though you might do a little extra work reclaiming garbage from
subsequently-modified pages toward the beginning of the table, it
would be unusual if they'd *all* been modified. Plus, if they've
recently been modified, they're more likely to be in cache.

I think this patch really needs a test scenario or demonstration of
some kind to prove that it produces a measurable benefit.

Okay. A simple test could be that we cancel a long running vacuum on a
large table that is being updated and rerun vacuum. And then we see
the garbage on that table. I'll test it.

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

#11

btkimurayuzk

btkimurayuzk@oss.nttdata.com

about 6 years ago

In reply to: Masahiko Sawada (#10)

Re: Resume vacuum and autovacuum from interruption and cancellation

+ VACOPT_RESUME = 1 << 8 /* resume from the previous point */

I think this unused ENUM value is not needed.

Regards,

Yu Kimura

#12

Masahiko Sawada

masahiko.sawada@2ndquadrant.com

almost 6 years ago

In reply to: Masahiko Sawada (#10)

1 attachment(s)

Re: Resume vacuum and autovacuum from interruption and cancellation

On Tue, 5 Nov 2019 at 15:57, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:

On Sat, 2 Nov 2019 at 02:10, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Aug 8, 2019 at 9:42 AM Rafia Sabih <rafia.pghackers@gmail.com> wrote:

Sounds like an interesting idea, but does it really help? Because if
vacuum was interrupted previously, wouldn't it already know the dead
tuples, etc in the next run quite quickly, as the VM, FSM is already
updated for the page in the previous run.

+1. I don't deny that a patch like this could sometimes save
something, but it doesn't seem like it would save all that much all
that often. If your autovacuum runs are being frequently cancelled,
that's going to be a big problem, I think.

I've observed the case where user wants to cancel a very long running
autovacuum (sometimes for anti-wraparound) for doing DDL or something
maintenance works. If the table is very large autovacuum could take a
long time and they might not reclaim garbage enough.

And as Rafia says, even
though you might do a little extra work reclaiming garbage from
subsequently-modified pages toward the beginning of the table, it
would be unusual if they'd *all* been modified. Plus, if they've
recently been modified, they're more likely to be in cache.

I think this patch really needs a test scenario or demonstration of
some kind to prove that it produces a measurable benefit.

Okay. A simple test could be that we cancel a long running vacuum on a
large table that is being updated and rerun vacuum. And then we see
the garbage on that table. I'll test it.

Attached the updated version patch.

I've measured the effect by this patch. In the test, I simulate the
case where autovacuum running on the table that is being updated is
canceled in the middle of vacuum, and then rerun (or resume)
autovacuum on the table. Since the vacuum resume block is saved after
heap vacuum, I set maintenance_work_mem so that vacuum on that table
needs heap vacuum twice or more. In other words, maintenance_work_mem
are used up during autovacuum at least more than once. The detail step
is:

1. Make table dirty for 15 min
2. Run vacuum with vacuum delays
3. After the first heap vacuum, cancel it
4. Rerun vacuum (or with the patch resume vacuum)
Through step #2 to step #4 the table is being updated in background. I
used pgbench and \random command, so the table is updated uniformly.

I've measured the dead tuple percentage of the table. In these tests,
how long step #4 took and how much collected garbage at step #4 are
important.

1. Canceled vacuum after processing about 20% of table at step #2.
1-1. HEAD
After making dirtied (after step #1): 6.96%
After cancellation (after step #3): 6.13%

At step #4, vacuum reduced it to 4.01% and took 12m 49s. The vacuum
efficiency is 0.16%/m (2.12% down in 12.8min).

1-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.92%
After cancellation (after step #3): 5.84%

At step #4, vacuum reduced it to 4.32% and took 10m 26s. The vacuum
efficiency is 0.14%/m.

------
2. Canceled vacuum after processing about 40% of table at step #2.
2-1. HEAD
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.56%

At step #4, vacuum reduced it to 1.91% and took 8m 15s.The vacuum
efficiency is 0.32%/m.

2-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.46%

At step #4, vacuum reduced it to 1.94% and took 6m 30s. The vacuum
efficiency is 0.38%/m.

-----
3. Canceled vacuum after processing about 70% of table at step #2.
3-1. HEAD
After making dirtied (after step #1): 6.97%
After cancellation (after step #3): 4.73%

At step #4, vacuum reduced it to 2.32% and took 8m 11s. The vacuum
efficiency is 0.29%/m.

3-2. Patched (resume vacuum)
After making dirtied (after step #1): 6.96%
After cancellation (after step #3): 4.73%

At step #4, vacuum reduced it to 3.25% and took 4m 12s. The vacuum
efficiency is 0.35%/m.

According to those results, it's thought that the more we resume
vacuum from the tail of the table, the efficiency is good. Since the
table is being updated uniformly even during autovacuum it was more
efficient to restart autovacuum from last position rather than from
the beginning of the table. I think that results shows somewhat the
benefit of this patch but I'm concerned that it might be difficult for
users when to use this option. In practice the efficiency completely
depends on the dispersion of updated pages, and that test made pages
dirty uniformly, which is not a common situation. So probably if we
want this feature, I think we should automatically enable resuming
when we can basically be sure that resuming is better. For example, we
remember both the last vacuumed block and how many vacuum-able pages
seems to exist from there, and we decide to resume vacuum if we can
expect to process more many pages.

Regards

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachments:

v5-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchapplication/octet-stream; name=v5-0001-Add-RESUME-option-to-VACUUM-and-autovacuum.patchDownload

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 87586a7b06..7dfadf3875 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -2837,6 +2837,11 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
      <entry><type>bigint</type></entry>
      <entry>Estimated number of rows modified since this table was last analyzed</entry>
     </row>
+    <row>
+     <entry><structfield>vacuum_resume_block</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Block number to resume vacuuming</entry>
+    </row>
     <row>
      <entry><structfield>last_vacuum</structfield></entry>
      <entry><type>timestamp with time zone</type></entry>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 846056a353..f5d473b178 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -35,6 +35,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     INDEX_CLEANUP [ <replaceable class="parameter">boolean</replaceable> ]
     TRUNCATE [ <replaceable class="parameter">boolean</replaceable> ]
     PARALLEL <replaceable class="parameter">integer</replaceable>
+    RESUME [ <replaceable class="parameter">boolean</replaceable> ]
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
 
@@ -255,6 +256,23 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     </listitem>
    </varlistentry>
 
+   <varlistentry>
+    <term><literal>RESUME</literal></term>
+    <listitem>
+     <para>
+      Specifies that <command>VACUUM</command> should attempt to
+      resume vacuuming from the last vacuumed block saved at
+      <literal>vacuum_resume_block</literal> column of
+      <xref linkend="pg-stat-all-tables-view"/>. This behavior is helpful
+      when resuming vacuum run from interruption and cancellation. The default
+      is false unless the <literal>vacuum_resume</literal> option has been
+      set to true. This option is ignored if either the <literal>FULL</literal>,
+      <literal>FREEZE</literal>, or <literal>DISABLE_PAGE_SKIPPING</literal>
+      option is used.
+     </para>
+    </listitem>
+   </varlistentry>
+
    <varlistentry>
     <term><replaceable class="parameter">boolean</replaceable></term>
     <listitem>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 5325dd3f61..bc6dcd5e53 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -168,6 +168,15 @@ static relopt_bool boolRelOpts[] =
 		},
 		true
 	},
+	{
+		{
+			"vacuum_resume",
+			"Enables vacuum to resume vacuuming from the last vacuumed block",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		false
+	},
 	/* list terminator */
 	{{NULL}}
 };
@@ -1534,7 +1543,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		{"vacuum_index_cleanup", RELOPT_TYPE_BOOL,
 		offsetof(StdRdOptions, vacuum_index_cleanup)},
 		{"vacuum_truncate", RELOPT_TYPE_BOOL,
-		offsetof(StdRdOptions, vacuum_truncate)}
+		offsetof(StdRdOptions, vacuum_truncate)},
+		{"vacuum_resume", RELOPT_TYPE_BOOL,
+		offsetof(StdRdOptions, vacuum_resume)}
 	};
 
 	return (bytea *) build_reloptions(reloptions, validate, kind,
diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 03c43efc32..f54f0054b6 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -110,6 +110,14 @@
 #define VACUUM_FSM_EVERY_PAGES \
 	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
 
+/*
+ * When a table has no indexes, save the progress every 8GB so that we can
+ * resume vacuum from the middle of table. When table has indexes we save it
+ * after the second heap pass finished.
+ */
+#define VACUUM_RESUME_BLK_INTERVAL \
+	((BlockNumber) (((uint64) 8 * 1024 * 1024 * 1024) / BLCKSZ))
+
 /*
  * Guesstimation of number of dead tuples per page.  This is used to
  * provide an upper limit to memory allocated when vacuuming small
@@ -361,6 +369,7 @@ static void end_parallel_vacuum(Relation *Irel, IndexBulkDeleteResult **stats,
 								LVParallelState *lps, int nindexes);
 static LVSharedIndStats *get_indstats(LVShared *lvshared, int n);
 static bool skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared);
+static BlockNumber get_resume_block(Relation onerel);
 
 
 /*
@@ -398,6 +407,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
 	Assert(params != NULL);
 	Assert(params->index_cleanup != VACOPT_TERNARY_DEFAULT);
 	Assert(params->truncate != VACOPT_TERNARY_DEFAULT);
+	Assert(params->resume != VACOPT_TERNARY_DEFAULT);
 
 	/* not every AM requires these to be valid, but heap does */
 	Assert(TransactionIdIsNormal(onerel->rd_rel->relfrozenxid));
@@ -704,7 +714,8 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	TransactionId relminmxid = onerel->rd_rel->relminmxid;
 	BlockNumber empty_pages,
 				vacuumed_pages,
-				next_fsm_block_to_vacuum;
+				next_fsm_block_to_vacuum,
+				next_block_to_resume_vacuum;
 	double		num_tuples,		/* total number of nonremovable tuples */
 				live_tuples,	/* live tuples (reltuples estimate) */
 				tups_vacuumed,	/* tuples cleaned up by vacuum */
@@ -715,6 +726,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
 	BlockNumber next_unskippable_block;
+	BlockNumber	start_blkno = 0;
 	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
@@ -727,6 +739,19 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 
 	pg_rusage_init(&ru0);
 
+	/*
+	 * If resuming is not requested, we clear the last saved block so as not
+	 * keep the previous information. If requested and it is not an aggressive
+	 * vacuum, we fetch the last saved block number to resume and set it as the
+	 * starting block to vacuum.
+	 */
+	if (params->resume == VACOPT_TERNARY_DISABLED)
+		pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+										  onerel->rd_rel->relisshared,
+										  0);
+	else if (!aggressive)
+		start_blkno = get_resume_block(onerel);
+
 	relname = RelationGetRelationName(onerel);
 	if (aggressive)
 		ereport(elevel,
@@ -734,19 +759,30 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 						get_namespace_name(RelationGetNamespace(onerel)),
 						relname)));
 	else
-		ereport(elevel,
-				(errmsg("vacuuming \"%s.%s\"",
-						get_namespace_name(RelationGetNamespace(onerel)),
-						relname)));
+	{
+		if (start_blkno != 0)
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\" starting from %u block",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname, start_blkno)));
+		else
+			ereport(elevel,
+					(errmsg("vacuuming \"%s.%s\"",
+							get_namespace_name(RelationGetNamespace(onerel)),
+							relname)));
+	}
 
 	empty_pages = vacuumed_pages = 0;
 	next_fsm_block_to_vacuum = (BlockNumber) 0;
+	next_block_to_resume_vacuum = (BlockNumber) 0;
 	num_tuples = live_tuples = tups_vacuumed = nkeep = nunused = 0;
 
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
 
 	nblocks = RelationGetNumberOfBlocks(onerel);
+	Assert(start_blkno <= nblocks);	/* both are the same if there are no blocks */
+
 	vacrelstats->rel_pages = nblocks;
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->tupcount_pages = 0;
@@ -841,7 +877,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	 * the last page.  This is worth avoiding mainly because such a lock must
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
-	next_unskippable_block = 0;
+	next_unskippable_block = start_blkno;
 	if ((params->options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 	{
 		while (next_unskippable_block < nblocks)
@@ -870,7 +906,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
-	for (blkno = 0; blkno < nblocks; blkno++)
+	for (blkno = start_blkno; blkno < nblocks; blkno++)
 	{
 		Buffer		buf;
 		Page		page;
@@ -1008,6 +1044,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 			next_fsm_block_to_vacuum = blkno;
 
+			/* Save the current block number to resume vacuuming */
+			pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+											  onerel->rd_rel->relisshared,
+											  blkno);
+
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
@@ -1480,6 +1521,15 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 			 */
 			dead_tuples->num_tuples = 0;
 
+			/* Save the current block number to resume vacuuming */
+			if (blkno - next_block_to_resume_vacuum >= VACUUM_RESUME_BLK_INTERVAL)
+			{
+				pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+												  onerel->rd_rel->relisshared,
+												  blkno);
+				next_block_to_resume_vacuum = blkno;
+			}
+
 			/*
 			 * Periodically do incremental FSM vacuuming to make newly-freed
 			 * space visible on upper FSM pages.  Note: although we've cleaned
@@ -1644,6 +1694,11 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
 	if (blkno > next_fsm_block_to_vacuum)
 		FreeSpaceMapVacuumRange(onerel, next_fsm_block_to_vacuum, blkno);
 
+	/* Clear the saved block number */
+	pgstat_report_vacuum_resume_block(RelationGetRelid(onerel),
+									  onerel->rd_rel->relisshared,
+									  0);
+
 	/* report all blocks vacuumed */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
@@ -3376,3 +3431,22 @@ parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
 	table_close(onerel, ShareUpdateExclusiveLock);
 	pfree(stats);
 }
+
+/*
+ * Return the block number to resume vacuuming fetched from stats collector.
+ */
+static BlockNumber
+get_resume_block(Relation onerel)
+{
+	PgStat_StatTabEntry	*tabentry;
+
+	tabentry = pgstat_fetch_stat_tabentry(RelationGetRelid(onerel));
+
+	/* If not found a valid saved block number, resume from the first block */
+	if (tabentry == NULL ||
+		tabentry->vacuum_resume_block >= RelationGetNumberOfBlocks(onerel))
+		return (BlockNumber) 0;
+
+	Assert(tabentry->vacuum_resume_block >= 0);
+	return tabentry->vacuum_resume_block;
+}
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index f681aafcf9..b160787d1e 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -573,6 +573,7 @@ CREATE VIEW pg_stat_all_tables AS
             pg_stat_get_live_tuples(C.oid) AS n_live_tup,
             pg_stat_get_dead_tuples(C.oid) AS n_dead_tup,
             pg_stat_get_mod_since_analyze(C.oid) AS n_mod_since_analyze,
+            pg_stat_get_vacuum_resume_block(C.oid) AS vacuum_resume_blk,
             pg_stat_get_last_vacuum_time(C.oid) as last_vacuum,
             pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
             pg_stat_get_last_analyze_time(C.oid) as last_analyze,
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..6c8cf425b6 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -110,6 +110,7 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 	/* Set default value */
 	params.index_cleanup = VACOPT_TERNARY_DEFAULT;
 	params.truncate = VACOPT_TERNARY_DEFAULT;
+	params.resume = VACOPT_TERNARY_DEFAULT;
 
 	/* By default parallel vacuum is enabled */
 	params.nworkers = 0;
@@ -141,6 +142,8 @@ ExecVacuum(ParseState *pstate, VacuumStmt *vacstmt, bool isTopLevel)
 			disable_page_skipping = defGetBoolean(opt);
 		else if (strcmp(opt->defname, "index_cleanup") == 0)
 			params.index_cleanup = get_vacopt_ternary_value(opt);
+		else if (strcmp(opt->defname, "resume") == 0)
+			params.resume = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "truncate") == 0)
 			params.truncate = get_vacopt_ternary_value(opt);
 		else if (strcmp(opt->defname, "parallel") == 0)
@@ -1840,6 +1843,16 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params)
 			params->truncate = VACOPT_TERNARY_DISABLED;
 	}
 
+	/* Set resume option based on reloptions if not yet, default is false */
+	if (params->resume == VACOPT_TERNARY_DEFAULT)
+	{
+		if (onerel->rd_options == NULL ||
+			!((StdRdOptions *) onerel->rd_options)->vacuum_resume)
+			params->resume = VACOPT_TERNARY_DISABLED;
+		else
+			params->resume = VACOPT_TERNARY_ENABLED;
+	}
+
 	/*
 	 * Remember the relation's TOAST relation for later, if the caller asked
 	 * us to process it.  In VACUUM FULL, though, the toast table is
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 462b4d7e06..65db5126d2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -321,6 +321,8 @@ static void pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, in
 static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, int len);
 static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
 static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
+static void pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg,
+											int len);
 static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
 static void pgstat_recv_archiver(PgStat_MsgArchiver *msg, int len);
 static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
@@ -1421,6 +1423,27 @@ pgstat_report_vacuum(Oid tableoid, bool shared,
 	pgstat_send(&msg, sizeof(msg));
 }
 
+/* ---------
+ * pgstat_report_vacuum_resume_block() -
+ *
+ *	Tell the collector about the block number to resume.
+ * ---------
+ */
+void
+pgstat_report_vacuum_resume_block(Oid tableoid, bool shared, BlockNumber blkno)
+{
+	PgStat_MsgVacuumResumeBlock msg;
+
+	if (pgStatSock == PGINVALID_SOCKET || !pgstat_track_counts)
+		return;
+
+	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_VACUUMRESUMEBLOCK);
+	msg.m_databaseid = shared ? InvalidOid : MyDatabaseId;
+	msg.m_tableoid = tableoid;
+	msg.m_blkno = blkno;
+	pgstat_send(&msg, sizeof(msg));
+}
+
 /* --------
  * pgstat_report_analyze() -
  *
@@ -4591,6 +4614,11 @@ PgstatCollectorMain(int argc, char *argv[])
 					pgstat_recv_vacuum(&msg.msg_vacuum, len);
 					break;
 
+				case PGSTAT_MTYPE_VACUUMRESUMEBLOCK:
+					pgstat_recv_vacuum_resume_block(&msg.msg_vacuum_resume_block,
+													len);
+					break;
+
 				case PGSTAT_MTYPE_ANALYZE:
 					pgstat_recv_analyze(&msg.msg_analyze, len);
 					break;
@@ -6200,6 +6228,20 @@ pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len)
 	}
 }
 
+static void
+pgstat_recv_vacuum_resume_block(PgStat_MsgVacuumResumeBlock *msg, int len)
+{
+	PgStat_StatDBEntry *dbentry;
+	PgStat_StatTabEntry *tabentry;
+
+	/*
+	 * Store the data in the table's hashtable entry.
+	 */
+	dbentry = pgstat_get_db_entry(msg->m_databaseid, true);
+	tabentry = pgstat_get_tab_entry(dbentry, msg->m_tableoid, true);
+	tabentry->vacuum_resume_block = msg->m_blkno;
+}
+
 /* ----------
  * pgstat_recv_analyze() -
  *
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 7e6a3c1774..427e47c1c1 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -195,6 +195,20 @@ pg_stat_get_mod_since_analyze(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+Datum
+pg_stat_get_vacuum_resume_block(PG_FUNCTION_ARGS)
+{
+	Oid			relid = PG_GETARG_OID(0);
+	uint64		result;
+	PgStat_StatTabEntry *tabentry;
+
+	if ((tabentry = pgstat_fetch_stat_tabentry(relid)) == NULL)
+		result = 0;
+	else
+		result = (int64) (tabentry->vacuum_resume_block);
+
+	PG_RETURN_INT64(result);
+}
 
 Datum
 pg_stat_get_blocks_fetched(PG_FUNCTION_ARGS)
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 07a86c7b7b..9cd785d4c4 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -5129,6 +5129,11 @@
   proname => 'pg_stat_get_mod_since_analyze', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
   prosrc => 'pg_stat_get_mod_since_analyze' },
+{ oid => '8001',
+  descr => 'statistics: block number to resume vacuuming',
+  proname => 'pg_stat_get_vacuum_resume_block', provolatile => 's',
+  proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
+  prosrc => 'pg_stat_get_vacuum_resume_block' },
 { oid => '1934', descr => 'statistics: number of blocks fetched',
   proname => 'pg_stat_get_blocks_fetched', provolatile => 's',
   proparallel => 'r', prorettype => 'int8', proargtypes => 'oid',
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index c27d255d8d..cf3c2919e7 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -222,6 +222,8 @@ typedef struct VacuumParams
 										 * default value depends on reloptions */
 	VacOptTernaryValue truncate;	/* Truncate empty pages at the end,
 									 * default value depends on reloptions */
+	VacOptTernaryValue resume;		/* Resume vacuuming from the last vacuumed
+									 * block */
 
 	/*
 	 * The number of parallel vacuum workers.  0 by default which means choose
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 3a65a51696..8e991927d7 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -56,6 +56,7 @@ typedef enum StatMsgType
 	PGSTAT_MTYPE_RESETSINGLECOUNTER,
 	PGSTAT_MTYPE_AUTOVAC_START,
 	PGSTAT_MTYPE_VACUUM,
+	PGSTAT_MTYPE_VACUUMRESUMEBLOCK,
 	PGSTAT_MTYPE_ANALYZE,
 	PGSTAT_MTYPE_ARCHIVER,
 	PGSTAT_MTYPE_BGWRITER,
@@ -371,6 +372,14 @@ typedef struct PgStat_MsgVacuum
 	PgStat_Counter m_dead_tuples;
 } PgStat_MsgVacuum;
 
+typedef struct PgStat_MsgVacuumResumeBlock
+{
+	PgStat_MsgHdr m_hdr;
+	Oid			m_databaseid;
+	Oid			m_tableoid;
+	BlockNumber	m_blkno;
+} PgStat_MsgVacuumResumeBlock;
+
 
 /* ----------
  * PgStat_MsgAnalyze			Sent by the backend or autovacuum daemon
@@ -561,6 +570,7 @@ typedef union PgStat_Msg
 	PgStat_MsgResetsinglecounter msg_resetsinglecounter;
 	PgStat_MsgAutovacStart msg_autovacuum_start;
 	PgStat_MsgVacuum msg_vacuum;
+	PgStat_MsgVacuumResumeBlock msg_vacuum_resume_block;
 	PgStat_MsgAnalyze msg_analyze;
 	PgStat_MsgArchiver msg_archiver;
 	PgStat_MsgBgWriter msg_bgwriter;
@@ -650,6 +660,8 @@ typedef struct PgStat_StatTabEntry
 	PgStat_Counter blocks_fetched;
 	PgStat_Counter blocks_hit;
 
+	BlockNumber	vacuum_resume_block;
+
 	TimestampTz vacuum_timestamp;	/* user initiated vacuum */
 	PgStat_Counter vacuum_count;
 	TimestampTz autovac_vacuum_timestamp;	/* autovacuum initiated */
@@ -1264,6 +1276,8 @@ extern void pgstat_reset_single_counter(Oid objectid, PgStat_Single_Reset_Type t
 extern void pgstat_report_autovac(Oid dboid);
 extern void pgstat_report_vacuum(Oid tableoid, bool shared,
 								 PgStat_Counter livetuples, PgStat_Counter deadtuples);
+extern void pgstat_report_vacuum_resume_block(Oid tableoid, bool shared,
+											  BlockNumber blkno);
 extern void pgstat_report_analyze(Relation rel,
 								  PgStat_Counter livetuples, PgStat_Counter deadtuples,
 								  bool resetcounter);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 44ed04dd3f..24e7b14a24 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -277,6 +277,8 @@ typedef struct StdRdOptions
 	int			parallel_workers;	/* max number of parallel workers */
 	bool		vacuum_index_cleanup;	/* enables index vacuuming and cleanup */
 	bool		vacuum_truncate;	/* enables vacuum to truncate a relation */
+	bool		vacuum_resume;		/* enables vacuum to resume from last
+									 * vacuumed block. */
 } StdRdOptions;
 
 #define HEAP_MIN_FILLFACTOR			10
diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out
index 634f8256f7..55d6b439c3 100644
--- a/src/test/regress/expected/rules.out
+++ b/src/test/regress/expected/rules.out
@@ -1778,6 +1778,7 @@ pg_stat_all_tables| SELECT c.oid AS relid,
     pg_stat_get_live_tuples(c.oid) AS n_live_tup,
     pg_stat_get_dead_tuples(c.oid) AS n_dead_tup,
     pg_stat_get_mod_since_analyze(c.oid) AS n_mod_since_analyze,
+    pg_stat_get_vacuum_resume_block(c.oid) AS vacuum_resume_blk,
     pg_stat_get_last_vacuum_time(c.oid) AS last_vacuum,
     pg_stat_get_last_autovacuum_time(c.oid) AS last_autovacuum,
     pg_stat_get_last_analyze_time(c.oid) AS last_analyze,
@@ -2034,6 +2035,7 @@ pg_stat_sys_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
@@ -2077,6 +2079,7 @@ pg_stat_user_tables| SELECT pg_stat_all_tables.relid,
     pg_stat_all_tables.n_live_tup,
     pg_stat_all_tables.n_dead_tup,
     pg_stat_all_tables.n_mod_since_analyze,
+    pg_stat_all_tables.vacuum_resume_blk,
     pg_stat_all_tables.last_vacuum,
     pg_stat_all_tables.last_autovacuum,
     pg_stat_all_tables.last_analyze,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 0cfe28e63f..d1ca94a224 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -182,6 +182,25 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -254,6 +273,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);
 CREATE TABLE vacowned_parted (a int) PARTITION BY LIST (a);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index cf741f7b11..ff70290e99 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -152,6 +152,26 @@ SELECT pg_relation_size('vac_truncate_test') = 0;
 VACUUM (TRUNCATE FALSE, FULL TRUE) vac_truncate_test;
 DROP TABLE vac_truncate_test;
 
+-- RESUME option
+CREATE TABLE resume_test (i INT PRIMARY KEY, t TEXT);
+INSERT INTO resume_test(i, t) VALUES (generate_series(1,30),
+    repeat('1234567890',300));
+VACUUM (RESUME TRUE) resume_test;
+-- resume option is ignored
+VACUUM (RESUME TRUE, FREEZE TRUE) resume_test;
+VACUUM (RESUME TRUE, FULL TRUE) resume_test;
+VACUUM (RESUME TRUE, DISABLE_PAGE_SKIPPING TRUE) resume_test;
+-- Only parent enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = true,
+      toast.vacuum_resume = false);
+VACUUM (RESUME TRUE) resume_test;
+-- Only toast table enables resuming
+ALTER TABLE resume_test SET (vacuum_resume = false,
+      toast.vacuum_resume = true);
+-- Test some extra relations.
+VACUUM (RESUME TRUE) vaccluster;
+VACUUM (RESUME TRUE) vactst;
+
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
@@ -214,6 +234,7 @@ DROP TABLE vaccluster;
 DROP TABLE vactst;
 DROP TABLE vacparted;
 DROP TABLE no_index_cleanup;
+DROP TABLE resume_test;
 
 -- relation ownership, WARNING logs generated as all are skipped.
 CREATE TABLE vacowned (a int);

#13

Ibrar Ahmed

ibrar.ahmad@gmail.com

almost 6 years ago

In reply to: Masahiko Sawada (#12)

Re: Resume vacuum and autovacuum from interruption and cancellation

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, failed
Spec compliant: not tested
Documentation: not tested

Please fix the regression test cases.

The new status of this patch is: Waiting on Author

#14

David Steele

david@pgmasters.net

almost 6 years ago

In reply to: Masahiko Sawada (#12)

Re: Resume vacuum and autovacuum from interruption and cancellation

On 2/28/20 8:56 AM, Masahiko Sawada wrote:

According to those results, it's thought that the more we resume
vacuum from the tail of the table, the efficiency is good. Since the
table is being updated uniformly even during autovacuum it was more
efficient to restart autovacuum from last position rather than from
the beginning of the table. I think that results shows somewhat the
benefit of this patch but I'm concerned that it might be difficult for
users when to use this option. In practice the efficiency completely
depends on the dispersion of updated pages, and that test made pages
dirty uniformly, which is not a common situation. So probably if we
want this feature, I think we should automatically enable resuming
when we can basically be sure that resuming is better. For example, we
remember both the last vacuumed block and how many vacuum-able pages
seems to exist from there, and we decide to resume vacuum if we can
expect to process more many pages.

I have to say I'm a bit confused by the point of this patch. I get that
starting in progress is faster but that's only true because the entire
table is not being vacuumed?

If as you say:

If we start to vacuum from not first block, we can update neither
relfrozenxid nor relfrozenxmxid. And we might not be able to update
even relation statistics.

Then we'll still need to vacuum the entire table before we can be sure
the oldest xid has been removed/frozen. If we could do those updates on
a resume then that would change my thoughts on the feature a lot.

What am I missing?

I'm marking this Returned with Feedback due concerns expressed up-thread
(and mine) and because the patch has been Waiting on Author for nearly
the entire CF.

Regards,
--
-David
david@pgmasters.net