Block level parallel vacuum WIP

Started by Masahiko Sawadaover 9 years ago50 messages

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

2 attachment(s)

Hi all,

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Vacuum Processing Logic
===================

PostgreSQL VACUUM processing logic consists of 2 phases,
1. Collecting dead tuple locations on heap.
2. Reclaiming dead tuples from heap and indexes.
These phases 1 and 2 are executed alternately, and once amount of dead
tuple location reached maintenance_work_mem in phase 1, phase 2 will
be executed.

Basic Design
==========

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

Performance(PoC)
=========

I ran parallel vacuum on 13GB table (pgbench scale 1000) with several
workers (on my poor virtual machine).
The result is,

1. Vacuum whole table without index (disable page skipping)
1 worker : 33 sec
2 workers : 27 sec
3 workers : 23 sec
4 workers : 22 sec

2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 sec

As a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).

Another Design
============
ISTM that processing index vacuum by multiple process is not good idea
in most cases because many index items can be stored in a page and
multiple vacuum worker could try to require the cleanup lock on the
same index buffer.
It's rather better that multiple workers process particular block
range and then multiple workers process each particular block range,
and then one worker per index processes index vacuum.

Still lots of work to do but attached PoC patch.
Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada

Attachments:

0001-Allow-muliple-backends-to-wait-for-cleanup-lock.patchtext/plain; charset=US-ASCII; name=0001-Allow-muliple-backends-to-wait-for-cleanup-lock.patchDownload

From b25d491a05a43fb7adf014b2580c71ec7adb75a2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 8 Aug 2016 16:43:35 -0700
Subject: [PATCH 1/2] Allow muliple backends to wait for cleanup lock.

---
 src/backend/storage/buffer/buf_init.c |  3 +-
 src/backend/storage/buffer/bufmgr.c   | 57 +++++++++++++++++++++++------------
 src/include/storage/buf_internals.h   |  4 ++-
 src/include/storage/proc.h            |  2 ++
 4 files changed, 45 insertions(+), 21 deletions(-)

diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index a4163cf..2aad030 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -134,7 +134,8 @@ InitBufferPool(void)
 			CLEAR_BUFFERTAG(buf->tag);
 
 			pg_atomic_init_u32(&buf->state, 0);
-			buf->wait_backend_pid = 0;
+			dlist_init(&buf->pin_count_waiters);
+			pg_atomic_write_u32(&buf->nwaiters, 0);
 
 			buf->buf_id = i;
 
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 76ade37..f2f4ab9 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -38,6 +38,7 @@
 #include "catalog/storage.h"
 #include "executor/instrument.h"
 #include "lib/binaryheap.h"
+#include "lib/ilist.h"
 #include "miscadmin.h"
 #include "pg_trace.h"
 #include "pgstat.h"
@@ -1730,15 +1731,19 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner)
 			 */
 			buf_state = LockBufHdr(buf);
 
-			if ((buf_state & BM_PIN_COUNT_WAITER) &&
-				BUF_STATE_GET_REFCOUNT(buf_state) == 1)
+			if (buf_state & BM_PIN_COUNT_WAITER)
 			{
-				/* we just released the last pin other than the waiter's */
-				int			wait_backend_pid = buf->wait_backend_pid;
+				dlist_mutable_iter iter;
 
-				buf_state &= ~BM_PIN_COUNT_WAITER;
+				if (pg_atomic_read_u32(&buf->nwaiters) == 1)
+					buf_state &= ~BM_PIN_COUNT_WAITER;
+
+				dlist_foreach_modify(iter, &buf->pin_count_waiters)
+				{
+					PGPROC *waiter = dlist_container(PGPROC, clWaitLink, iter.cur);
+					ProcSendSignal(waiter->pid);
+				}
 				UnlockBufHdr(buf, buf_state);
-				ProcSendSignal(wait_backend_pid);
 			}
 			else
 				UnlockBufHdr(buf, buf_state);
@@ -3513,8 +3518,17 @@ UnlockBuffers(void)
 		 * got a cancel/die interrupt before getting the signal.
 		 */
 		if ((buf_state & BM_PIN_COUNT_WAITER) != 0 &&
-			buf->wait_backend_pid == MyProcPid)
-			buf_state &= ~BM_PIN_COUNT_WAITER;
+			pg_atomic_read_u32(&buf->nwaiters) == 1)
+		{
+			dlist_mutable_iter iter;
+
+			dlist_foreach_modify(iter, &buf->pin_count_waiters)
+			{
+				PGPROC *waiter = dlist_container(PGPROC, clWaitLink, iter.cur);
+				if (waiter->pid == MyProcPid)
+					buf_state &= ~BM_PIN_COUNT_WAITER;
+			}
+		}
 
 		UnlockBufHdr(buf, buf_state);
 
@@ -3616,20 +3630,24 @@ LockBufferForCleanup(Buffer buffer)
 		buf_state = LockBufHdr(bufHdr);
 
 		Assert(BUF_STATE_GET_REFCOUNT(buf_state) > 0);
-		if (BUF_STATE_GET_REFCOUNT(buf_state) == 1)
+		/*
+		 * If refcount == 1 then we can break immediately.
+		 * In case of refcount > 1, if refcount == (nwaiters + 1) then break.
+		 * Because refcount include other processes and itself, but nwaiters
+		 * includes only other processes.
+		 */
+		if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 ||
+			((BUF_STATE_GET_REFCOUNT(buf_state) - 1)==
+			 pg_atomic_read_u32(&bufHdr->nwaiters)))
 		{
 			/* Successfully acquired exclusive lock with pincount 1 */
 			UnlockBufHdr(bufHdr, buf_state);
 			return;
 		}
 		/* Failed, so mark myself as waiting for pincount 1 */
-		if (buf_state & BM_PIN_COUNT_WAITER)
-		{
-			UnlockBufHdr(bufHdr, buf_state);
-			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
-			elog(ERROR, "multiple backends attempting to wait for pincount 1");
-		}
-		bufHdr->wait_backend_pid = MyProcPid;
+		pg_atomic_fetch_add_u32(&bufHdr->nwaiters, 1);
+		dlist_push_tail(&bufHdr->pin_count_waiters, &MyProc->clWaitLink);
+
 		PinCountWaitBuf = bufHdr;
 		buf_state |= BM_PIN_COUNT_WAITER;
 		UnlockBufHdr(bufHdr, buf_state);
@@ -3662,9 +3680,10 @@ LockBufferForCleanup(Buffer buffer)
 		 * better be safe.
 		 */
 		buf_state = LockBufHdr(bufHdr);
-		if ((buf_state & BM_PIN_COUNT_WAITER) != 0 &&
-			bufHdr->wait_backend_pid == MyProcPid)
-			buf_state &= ~BM_PIN_COUNT_WAITER;
+
+		dlist_delete(&MyProc->clWaitLink);
+		pg_atomic_fetch_sub_u32(&bufHdr->nwaiters, 1);
+
 		UnlockBufHdr(bufHdr, buf_state);
 
 		PinCountWaitBuf = NULL;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index e0dfb2f..90fcbd7 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -182,7 +182,9 @@ typedef struct BufferDesc
 	/* state of the tag, containing flags, refcount and usagecount */
 	pg_atomic_uint32 state;
 
-	int			wait_backend_pid;		/* backend PID of pin-count waiter */
+	dlist_head	pin_count_waiters;	/* backend PIDs of pin-count waiters */
+	pg_atomic_uint32	nwaiters;
+
 	int			freeNext;		/* link in freelist chain */
 
 	LWLock		content_lock;	/* to lock access to buffer contents */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index f576f05..4cd9416 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -123,6 +123,8 @@ struct PGPROC
 	LOCKMASK	heldLocks;		/* bitmask for lock types already held on this
 								 * lock object by this backend */
 
+	dlist_node	clWaitLink;		/* position in Cleanup Lock wait list */
+
 	/*
 	 * Info to allow us to wait for synchronous replication, if needed.
 	 * waitLSN is InvalidXLogRecPtr if not waiting; set only by user backend.
-- 
2.8.1

0002-Block-level-parallel-Vacuum.patchtext/plain; charset=US-ASCII; name=0002-Block-level-parallel-Vacuum.patchDownload

From 1955cb9f68f9c027c566c4909d7ead475cf20f3b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 8 Aug 2016 16:43:55 -0700
Subject: [PATCH 2/2] Block level parallel Vacuum.

---
 src/backend/access/nbtree/nbtutils.c |  19 ---
 src/backend/commands/vacuum.c        |   1 +
 src/backend/commands/vacuumlazy.c    | 239 ++++++++++++++++++++++++++++-------
 src/backend/utils/misc/guc.c         |  10 ++
 src/include/commands/vacuum.h        |  41 +++++-
 src/include/storage/buf_internals.h  |   1 +
 6 files changed, 245 insertions(+), 66 deletions(-)

diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 5d335c7..987aceb 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1918,25 +1918,6 @@ _bt_start_vacuum(Relation rel)
 	if (result == 0 || result > MAX_BT_CYCLE_ID)
 		result = btvacinfo->cycle_ctr = 1;
 
-	/* Let's just make sure there's no entry already for this index */
-	for (i = 0; i < btvacinfo->num_vacuums; i++)
-	{
-		vac = &btvacinfo->vacuums[i];
-		if (vac->relid.relId == rel->rd_lockInfo.lockRelId.relId &&
-			vac->relid.dbId == rel->rd_lockInfo.lockRelId.dbId)
-		{
-			/*
-			 * Unlike most places in the backend, we have to explicitly
-			 * release our LWLock before throwing an error.  This is because
-			 * we expect _bt_end_vacuum() to be called before transaction
-			 * abort cleanup can run to release LWLocks.
-			 */
-			LWLockRelease(BtreeVacuumLock);
-			elog(ERROR, "multiple active vacuums for index \"%s\"",
-				 RelationGetRelationName(rel));
-		}
-	}
-
 	/* OK, add an entry */
 	if (btvacinfo->num_vacuums >= btvacinfo->max_vacuums)
 	{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0563e63..1562773 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -58,6 +58,7 @@ int			vacuum_freeze_min_age;
 int			vacuum_freeze_table_age;
 int			vacuum_multixact_freeze_min_age;
 int			vacuum_multixact_freeze_table_age;
+int			parallel_vacuum_workers;
 
 
 /* A few variables that don't seem worth passing around as parameters */
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 231e92d..4fc880d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -42,8 +42,10 @@
 #include "access/heapam_xlog.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/parallel.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -98,33 +100,9 @@
  */
 #define SKIP_PAGES_THRESHOLD	((BlockNumber) 32)
 
-typedef struct LVRelStats
-{
-	/* hasindex = true means two-pass strategy; false means one-pass */
-	bool		hasindex;
-	/* Overall statistics about rel */
-	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
-	BlockNumber rel_pages;		/* total number of pages */
-	BlockNumber scanned_pages;	/* number of pages we examined */
-	BlockNumber pinskipped_pages;		/* # of pages we skipped due to a pin */
-	BlockNumber frozenskipped_pages;	/* # of frozen pages we skipped */
-	double		scanned_tuples; /* counts only tuples on scanned pages */
-	double		old_rel_tuples; /* previous value of pg_class.reltuples */
-	double		new_rel_tuples; /* new estimated total # of tuples */
-	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
-	double		tuples_deleted;
-	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
-	int			num_index_scans;
-	TransactionId latestRemovedXid;
-	bool		lock_waiter_detected;
-} LVRelStats;
-
+/* DSM key for block-level parallel vacuum */
+#define VACUUM_KEY_TASK	50
+#define VACUUM_KEY_WORKER_STATS 51
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -137,9 +115,6 @@ static BufferAccessStrategy vac_strategy;
 
 
 /* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
-			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
-			   bool aggressive);
 static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
@@ -162,6 +137,18 @@ static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 					 TransactionId *visibility_cutoff_xid, bool *all_frozen);
 
+/* functions for parallel vacuum */
+static void parallel_lazy_scan_heap(Relation rel, LVRelStats *vacrelstats,
+									Relation *Irel, int nindexes, int options,
+									bool aggressive, int wnum);
+static void vacuum_worker(dsm_segment *seg, shm_toc *toc);
+static void lazy_scan_heap(Relation onerel, int options,
+								  LVRelStats *vacrelstats, Relation *Irel,
+								  int nindexes, bool aggressive,
+								  BlockNumber begin, BlockNumber nblocks);
+static void gather_vacuum_stats(LVRelStats *valrelstats, LVRelStats *worker_stats,
+								int wnum);
+
 
 /*
  *	lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
@@ -248,8 +235,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
 	vacrelstats->hasindex = (nindexes > 0);
 
-	/* Do the vacuuming */
-	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
+	/* Do the parallel vacuuming. */
+	if (parallel_vacuum_workers > 1)
+		parallel_lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, options,
+								aggressive, parallel_vacuum_workers);
+	else
+	{
+		BlockNumber nblocks = RelationGetNumberOfBlocks(onerel);
+
+		lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes,
+					   aggressive, 0, nblocks);
+	}
 
 	/* Done with indexes */
 	vac_close_indexes(nindexes, Irel, NoLock);
@@ -428,7 +424,132 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
- *	lazy_scan_heap() -- scan an open heap relation
+ * Launch parallel vacuum workers specified by vacuum_parallel_workers and then
+ * gather the result stats of each workers. The idea of vacuuming one relation
+ * with multiple workers parallely is that each worker is assigned particlar block
+ * range of relation which is calculated using by parallel_vacuum_workers and
+ * the number of relation blocks. The informations and some threshoulds (e.g.
+ * OldestXmin, FreezeLimit, MultiXactCufoff) are stored into DSM tagged by
+ * VACUUM_KEY_TASK. Each worker can collect the garbage tid and reclaims them as
+ * well.  Vacuum statistics for each workers are stored into DSm tagged by
+ * VACUUM_KEY_WORKER_STATS, that will be gathered by the leader process after all
+ * worker finished its task.
+ */
+static void
+parallel_lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
+						Relation *Irel, int nindexes, int options,
+						bool aggressive, int wnum)
+{
+	ParallelContext	*pcxt;
+	LVRelStats *wstats_space;
+	VacuumTask	*task_space;
+	IndexBulkDeleteResult **indstats;
+	int size = 0;
+	int	i;
+
+	EnterParallelMode();
+
+	/* Create parallel context and initialize it */
+	pcxt = CreateParallelContext(vacuum_worker, wnum);
+	size += BUFFERALIGN(sizeof(VacuumTask)); /* For task */
+	size += BUFFERALIGN(sizeof(LVRelStats) * pcxt->nworkers); /* For worker stats */
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, 2);
+	InitializeParallelDSM(pcxt);
+
+	/* Prepare for VacuumTask space */
+	task_space = (VacuumTask *)shm_toc_allocate(pcxt->toc, sizeof(VacuumTask));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_TASK, task_space);
+	task_space->relid = RelationGetRelid(onerel);
+	task_space->aggressive = aggressive;
+	task_space->options = options;
+	task_space->oldestxmin = OldestXmin;
+	task_space->freezelimit = FreezeLimit;
+	task_space->multixactcutoff = MultiXactCutoff;
+	task_space->wnum = wnum;
+	task_space->elevel = elevel;
+
+	/* Prepare for worker LVRelStats space */
+	wstats_space = (LVRelStats *)shm_toc_allocate(pcxt->toc,
+												 sizeof(LVRelStats) * pcxt->nworkers);
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_WORKER_STATS, wstats_space);
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		LVRelStats *wstats = wstats_space + sizeof(LVRelStats) * i;
+		memcpy(wstats, vacrelstats, sizeof(LVRelStats));
+	}
+
+	/* Do parallel vacuum */
+	LaunchParallelWorkers(pcxt);
+
+	/* Wait for workers finising vacuuming */
+	WaitForParallelWorkersToFinish(pcxt);
+	gather_vacuum_stats(vacrelstats, wstats_space, wnum);
+
+	DestroyParallelContext(pcxt);
+	ExitParallelMode();
+
+	indstats = (IndexBulkDeleteResult **)
+		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
+
+	/* Do post-vacuum cleanup and statistics update for each index */
+	for (i = 0; i < nindexes; i++)
+		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+}
+
+/*
+ * Entry function for parallel vacuum worker. Each worker calculates the
+ * starting block number and number of blocks need to process, and then
+ * does vacuuming particular block range of relation.
+ */
+static void
+vacuum_worker(dsm_segment *seg, shm_toc *toc)
+{
+	VacuumTask	*task;
+	LVRelStats	*wstats_space;
+	LVRelStats	*wstats;
+	Relation	rel;
+	BlockNumber	begin;
+	BlockNumber nblocks_per_worker;
+	BlockNumber	nblocks;
+	int			nindexes;
+	Relation	*Irel;
+
+	/* Set up task information */
+	task = (VacuumTask *)shm_toc_lookup(toc, VACUUM_KEY_TASK);
+	OldestXmin = task->oldestxmin;
+	FreezeLimit = task->freezelimit;
+	MultiXactCutoff = task->multixactcutoff;
+
+	/* Set up message queue */
+	wstats_space = (LVRelStats *)shm_toc_lookup(toc, VACUUM_KEY_WORKER_STATS);
+	wstats = wstats_space + sizeof(LVRelStats) * ParallelWorkerNumber;
+
+	/* Calculate how many blocks the worker should process */
+	rel = heap_open(task->relid, NoLock);
+	vac_open_indexes(rel, RowExclusiveLock, &nindexes, &Irel);
+	nblocks_per_worker = RelationGetNumberOfBlocks(rel) / parallel_vacuum_workers;
+	begin = nblocks_per_worker * ParallelWorkerNumber;
+
+	/* The last worker processes remaining blocks */
+	if (ParallelWorkerNumber == (task->wnum - 1))
+		nblocks = RelationGetNumberOfBlocks(rel) - begin;
+	else
+		nblocks = nblocks_per_worker;
+
+	/* Set up elevel */
+	elevel = task->elevel;
+
+	/* Do vacuuming particular area */
+	lazy_scan_heap(rel, task->options, wstats, Irel, nindexes,
+				   task->aggressive, begin, nblocks);
+
+	heap_close(rel, NoLock);
+	vac_close_indexes(nindexes, Irel, NoLock);
+}
+
+/*
+ *	lazy_scan_heap() -- scan paritclar range of open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
  *		things truncate dead tuples to dead line pointers, defragment the
@@ -445,10 +566,10 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
  */
 static void
 lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive)
+			   Relation	*Irel, int nindexes, bool aggressive,
+			   BlockNumber begin, BlockNumber nblocks)
 {
-	BlockNumber nblocks,
-				blkno;
+	BlockNumber blkno;
 	HeapTupleData tuple;
 	char	   *relname;
 	BlockNumber empty_pages,
@@ -471,14 +592,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		PROGRESS_VACUUM_MAX_DEAD_TUPLES
 	};
 	int64		initprog_val[3];
+	BlockNumber end = begin + nblocks;
 
 	pg_rusage_init(&ru0);
 
 	relname = RelationGetRelationName(onerel);
 	ereport(elevel,
-			(errmsg("vacuuming \"%s.%s\"",
+			(errmsg("vacuuming \"%s.%s\", from block %u to %u, %u blocks",
 					get_namespace_name(RelationGetNamespace(onerel)),
-					relname)));
+					relname, begin, end, nblocks)));
 
 	empty_pages = vacuumed_pages = 0;
 	num_tuples = tups_vacuumed = nkeep = nunused = 0;
@@ -486,7 +608,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	indstats = (IndexBulkDeleteResult **)
 		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
 
-	nblocks = RelationGetNumberOfBlocks(onerel);
 	vacrelstats->rel_pages = nblocks;
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->nonempty_pages = 0;
@@ -545,10 +666,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	 * the last page.  This is worth avoiding mainly because such a lock must
 	 * be replayed on any hot standby, where it can be disruptive.
 	 */
-	next_unskippable_block = 0;
+	next_unskippable_block = begin;
 	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
 	{
-		while (next_unskippable_block < nblocks)
+		while (next_unskippable_block < end)
 		{
 			uint8		vmstatus;
 
@@ -574,7 +695,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	else
 		skipping_blocks = false;
 
-	for (blkno = 0; blkno < nblocks; blkno++)
+	for (blkno = begin; blkno < end; blkno++)
 	{
 		Buffer		buf;
 		Page		page;
@@ -1306,10 +1427,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
 
-	/* Do post-vacuum cleanup and statistics update for each index */
-	for (i = 0; i < nindexes; i++)
-		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
-
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
 		ereport(elevel,
@@ -1317,6 +1434,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	/* Do post-vacuum cleanup and statistics update for each index */
+	if (!IsParallelWorker())
+	{
+		for (i = 0; i < nindexes; i++)
+			lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+	}
+
 	/*
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
@@ -1347,6 +1471,29 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pfree(buf.data);
 }
 
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+gather_vacuum_stats(LVRelStats *vacrelstats, LVRelStats *worker_stats, int wnum)
+{
+	int	i;
+
+	/* Gather each worker stats */
+	for (i = 0; i < wnum; i++)
+	{
+		LVRelStats *wstats = worker_stats + sizeof(LVRelStats) * i;
+
+		vacrelstats->rel_pages += wstats->rel_pages;
+		vacrelstats->scanned_pages += wstats->scanned_pages;
+		vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+		vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+		vacrelstats->scanned_tuples += wstats->scanned_tuples;
+		vacrelstats->new_rel_tuples += wstats->new_rel_tuples;
+		vacrelstats->pages_removed += wstats->pages_removed;
+		vacrelstats->nonempty_pages += wstats->nonempty_pages;
+	}
+}
 
 /*
  *	lazy_vacuum_heap() -- second pass over the heap
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c5178f7..0dd64bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2661,6 +2661,16 @@ static struct config_int ConfigureNamesInt[] =
 	},
 
 	{
+		{"parallel_vacuum_workers", PGC_USERSET, RESOURCES_ASYNCHRONOUS,
+			gettext_noop("Sets the number of parallel worker for vacuum."),
+			NULL
+		},
+		&parallel_vacuum_workers,
+		1, 1, 1024,
+		NULL, NULL, NULL
+	},
+
+	{
 		{"autovacuum_work_mem", PGC_SIGHUP, RESOURCES_MEM,
 			gettext_noop("Sets the maximum memory to be used by each autovacuum worker process."),
 			NULL,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 80cd4a8..fc46c09 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -147,6 +147,45 @@ typedef struct VacuumParams
 										 * activated, -1 to use default */
 } VacuumParams;
 
+typedef struct LVRelStats
+{
+	/* hasindex = true means two-pass strategy; false means one-pass */
+	bool		hasindex;
+	/* Overall statistics about rel */
+	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
+	BlockNumber rel_pages;		/* total number of pages */
+	BlockNumber scanned_pages;	/* number of pages we examined */
+	BlockNumber pinskipped_pages;		/* # of pages we skipped due to a pin */
+	BlockNumber frozenskipped_pages;	/* # of frozen pages we skipped */
+	double		scanned_tuples; /* counts only tuples on scanned pages */
+	double		old_rel_tuples; /* previous value of pg_class.reltuples */
+	double		new_rel_tuples; /* new estimated total # of tuples */
+	double		new_dead_tuples;	/* new estimated total # of dead tuples */
+	BlockNumber pages_removed;
+	double		tuples_deleted;
+	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+	/* List of TIDs of tuples we intend to delete */
+	/* NB: this list is ordered by TID address */
+	int			num_dead_tuples;	/* current # of entries */
+	int			max_dead_tuples;	/* # slots allocated in array */
+	ItemPointer dead_tuples;	/* array of ItemPointerData */
+	int			num_index_scans;
+	TransactionId latestRemovedXid;
+	bool		lock_waiter_detected;
+} LVRelStats;
+
+typedef struct VacuumTask
+{
+	Oid			relid;	/* Target relation oid */
+	bool		aggressive;	/* does each worker need to aggressive vacuum? */
+	int			options; /* Specified vacuum options */
+	TransactionId	oldestxmin;
+	TransactionId	freezelimit;
+	MultiXactId		multixactcutoff;
+	int			wnum;
+	int			elevel;
+} VacuumTask;
+
 /* GUC parameters */
 extern PGDLLIMPORT int default_statistics_target;		/* PGDLLIMPORT for
 														 * PostGIS */
@@ -154,7 +193,7 @@ extern int	vacuum_freeze_min_age;
 extern int	vacuum_freeze_table_age;
 extern int	vacuum_multixact_freeze_min_age;
 extern int	vacuum_multixact_freeze_table_age;
-
+extern int	parallel_vacuum_workers;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 90fcbd7..4f9d986 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -15,6 +15,7 @@
 #ifndef BUFMGR_INTERNALS_H
 #define BUFMGR_INTERNALS_H
 
+#include "lib/ilist.h"
 #include "storage/buf.h"
 #include "storage/bufmgr.h"
 #include "storage/latch.h"
-- 
2.8.1

Васильев Дмитрий

d.vasilyev@postgrespro.ru

over 9 years ago

In reply to: Masahiko Sawada (#1)

Re: Block level parallel vacuum WIP

I repeat your test on ProLiant DL580 Gen9 with Xeon E7-8890 v3.

pgbench -s 100 and command vacuum pgbench_acounts after 10_000 transactions:

with: alter system set vacuum_cost_delay to DEFAULT;
parallel_vacuum_workers | time
1 | 138.703,263 ms
2 | 83.751,064 ms
4 | 66.105,861 ms
8 | 59.820,171 ms

with: alter system set vacuum_cost_delay to 1;
parallel_vacuum_workers | time
1 | 127.210,896 ms
2 | 75.300,278 ms
4 | 64.253,087 ms
8 | 60.130,953

---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.ru
The Russian Postgres Company

2016-08-23 14:02 GMT+03:00 Masahiko Sawada <sawada.mshk@gmail.com>:

Show quoted text

Hi all,

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Vacuum Processing Logic
===================

PostgreSQL VACUUM processing logic consists of 2 phases,
1. Collecting dead tuple locations on heap.
2. Reclaiming dead tuples from heap and indexes.
These phases 1 and 2 are executed alternately, and once amount of dead
tuple location reached maintenance_work_mem in phase 1, phase 2 will
be executed.

Basic Design
==========

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

Performance(PoC)
=========

I ran parallel vacuum on 13GB table (pgbench scale 1000) with several
workers (on my poor virtual machine).
The result is,

1. Vacuum whole table without index (disable page skipping)
1 worker : 33 sec
2 workers : 27 sec
3 workers : 23 sec
4 workers : 22 sec

2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 sec

As a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).

Another Design
============
ISTM that processing index vacuum by multiple process is not good idea
in most cases because many index items can be stored in a page and
multiple vacuum worker could try to require the cleanup lock on the
same index buffer.
It's rather better that multiple workers process particular block
range and then multiple workers process each particular block range,
and then one worker per index processes index vacuum.

Still lots of work to do but attached PoC patch.
Feedback and suggestion are very welcome.

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Claudio Freire

klaussfreire@gmail.com

over 9 years ago

In reply to: Masahiko Sawada (#1)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 8:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 sec

As a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Tom Lane

tgl@sss.pgh.pa.us

over 9 years ago

In reply to: Claudio Freire (#3)

Re: Block level parallel vacuum WIP

Claudio Freire <klaussfreire@gmail.com> writes:

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alexander Korotkov

a.korotkov@postgrespro.ru

over 9 years ago

In reply to: Tom Lane (#4)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Claudio Freire <klaussfreire@gmail.com> writes:

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.

+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Masahiko Sawada (#1)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Alex Ignatov

a.ignatov@postgrespro.ru

over 9 years ago

In reply to: Michael Paquier (#6)

Re: Block level parallel vacuum WIP

On 23.08.2016 15:41, Michael Paquier wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

Rotating disks is not a problem - you can always raid them and etc. 8k
allocation per relation once per half an hour that is the problem. Seq
scan is this way = random scan...

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Amit Kapila

amit.kapila16@gmail.com

over 9 years ago

In reply to: Michael Paquier (#6)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Masahiko Sawada (#1)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Great. This is something that I have thought about, too. Andres and
Heikki recommended it as a project to me a few PGCons ago.

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

This doesn't seem like a good design, because it adds a lot of extra
index scanning work. What I think you should do is:

1. Use a parallel heap scan (heap_beginscan_parallel) to let all
workers scan in parallel. Allocate a DSM segment to store the control
structure for this parallel scan plus an array for the dead tuple IDs
and a lock to protect the array.

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

Later, we can try to see if there's a way to have multiple workers
work together to vacuum a single index. But the above seems like a
good place to start.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.

You won't need this if you proceed as above, which is probably a good thing.

And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

I suspect that for autovacuum there is little reason to use parallel
vacuum, since most of the time we are trying to slow vacuum down, not
speed it up. I'd be inclined, for starters, to just add a PARALLEL
option to the VACUUM command, for when people want to speed up
parallel vacuums. Perhaps

VACUUM (PARALLEL 4) relation;

...could mean to vacuum the relation with the given number of workers, and:

VACUUM (PARALLEL) relation;

...could mean to vacuum the relation in parallel with the system
choosing the number of workers - 1 worker per index is probably a good
starting formula, though it might need some refinement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#10

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

In reply to: Alexander Korotkov (#5)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 9:40 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:

On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Claudio Freire <klaussfreire@gmail.com> writes:

Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).

What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.

+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.

Or we can improve vacuum of single index by changing data
representation of dead tuple to bitmap.
It can reduce the number of index whole scan during vacuum and make
comparing the index item to the dead tuples faster.
This is a listed on Todo list and I've implemented it.

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#11

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Robert Haas (#9)

Re: Block level parallel vacuum WIP

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#12

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

In reply to: Robert Haas (#9)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 10:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.

Great. This is something that I have thought about, too. Andres and
Heikki recommended it as a project to me a few PGCons ago.

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.

This doesn't seem like a good design, because it adds a lot of extra
index scanning work. What I think you should do is:

1. Use a parallel heap scan (heap_beginscan_parallel) to let all
workers scan in parallel. Allocate a DSM segment to store the control
structure for this parallel scan plus an array for the dead tuple IDs
and a lock to protect the array.

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

Later, we can try to see if there's a way to have multiple workers
work together to vacuum a single index. But the above seems like a
good place to start.

Thank you for the advice.
That's a what I thought as an another design, I will change the patch
to this design.

I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.

You won't need this if you proceed as above, which is probably a good thing.

Right.

And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.

I suspect that for autovacuum there is little reason to use parallel
vacuum, since most of the time we are trying to slow vacuum down, not
speed it up. I'd be inclined, for starters, to just add a PARALLEL
option to the VACUUM command, for when people want to speed up
parallel vacuums. Perhaps

VACUUM (PARALLEL 4) relation;

...could mean to vacuum the relation with the given number of workers, and:

VACUUM (PARALLEL) relation;

...could mean to vacuum the relation in parallel with the system
choosing the number of workers - 1 worker per index is probably a good
starting formula, though it might need some refinement.

It looks convenient.
I was thinking that we can manage the number of parallel worker per
table using this parameter for autovacuum , like
ALTER TABLE relation SET (parallel_vacuum_workers = 2)

Regards,

--
Masahiko Sawada

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#13

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Alvaro Herrera (#11)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#14

Alvaro Herrera

alvherre@2ndquadrant.com

over 9 years ago

In reply to: Robert Haas (#13)

Re: Block level parallel vacuum WIP

Robert Haas wrote:

On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.

Sure, just putting the idea out there.

--
ï¿½lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#15

Andres Freund

andres@anarazel.de

over 9 years ago

In reply to: Robert Haas (#13)

Re: Block level parallel vacuum WIP

On 2016-08-23 12:17:30 -0400, Robert Haas wrote:

On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:

Robert Haas wrote:

2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.

At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.

Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.

Given that index scan(s) are, in my experience, way more often the
bottleneck than the heap-scan(s), I'm not sure that order is the
best. The heap-scan benefits from the VM, the index scans don't.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#16

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Amit Kapila (#8)

Re: Block level parallel vacuum WIP

On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

Ah, so it was the reason. Thanks for confirming my doubts on what is proposed.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#17

Pavan Deolasee

pavan.deolasee@gmail.com

over 9 years ago

In reply to: Michael Paquier (#16)

Re: Block level parallel vacuum WIP

On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com>

wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

Ah, so it was the reason. Thanks for confirming my doubts on what is
proposed.
--

I believe Sawada-san has got enough feedback on the design to work out the
next steps. It seems natural that the vacuum workers are assigned a portion
of the heap to scan and collect dead tuples (similar to what patch does)
and the same workers to be responsible for the second phase of heap scan.

But as far as index scans are concerned, I agree with Tom that the best
strategy is to assign a different index to each worker process and let them
vacuum indexes in parallel. That way the work for each worker process is
clearly cut out and they don't contend for the same resources, which means
the first patch to allow multiple backends to wait for a cleanup buffer is
not required. Later we could extend it further such multiple workers can
vacuum a single index by splitting the work on physical boundaries, but
even that will ensure clear demarkation of work and hence no contention on
index blocks.

ISTM this will require further work and it probably makes sense to mark the
patch as "Returned with feedback" for now.

Thanks,
Pavan

--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

#18

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

In reply to: Pavan Deolasee (#17)

Re: Block level parallel vacuum WIP

On Sat, Sep 10, 2016 at 7:44 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:

On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:

On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada
<sawada.mshk@gmail.com> wrote:

As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).

So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.

The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.

Ah, so it was the reason. Thanks for confirming my doubts on what is
proposed.
--

I believe Sawada-san has got enough feedback on the design to work out the
next steps. It seems natural that the vacuum workers are assigned a portion
of the heap to scan and collect dead tuples (similar to what patch does) and
the same workers to be responsible for the second phase of heap scan.

Yeah, thank you for the feedback.

But as far as index scans are concerned, I agree with Tom that the best
strategy is to assign a different index to each worker process and let them
vacuum indexes in parallel.
That way the work for each worker process is
clearly cut out and they don't contend for the same resources, which means
the first patch to allow multiple backends to wait for a cleanup buffer is
not required. Later we could extend it further such multiple workers can
vacuum a single index by splitting the work on physical boundaries, but even
that will ensure clear demarkation of work and hence no contention on index
blocks.

I also agree with this idea.
Each worker vacuums different indexes and then the leader process
should update all index statistics after parallel mode exited.

I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#19

Robert Haas

robertmhaas@gmail.com

over 9 years ago

In reply to: Masahiko Sawada (#18)

Re: Block level parallel vacuum WIP

On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.

Hmm, yeah. This is one of the reasons why parallel queries currently
need to be entirely read-only. I think there's a decent argument that
the relation extension lock mechanism should be entirely redesigned:
the current system is neither particularly fast nor particularly
elegant, and some of the services that the heavyweight lock manager
provides, such as deadlock detection, are not relevant for relation
extension locks. I'm not sure if we should try to fix that right away
or come up with some special-purpose hack for vacuum, such as having
backends use condition variables to take turns calling
visibilitymap_set().

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#20

Masahiko Sawada

sawada.mshk@gmail.com

over 9 years ago

In reply to: Robert Haas (#19)

Re: Block level parallel vacuum WIP

On Thu, Sep 15, 2016 at 11:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.

Hmm, yeah. This is one of the reasons why parallel queries currently
need to be entirely read-only. I think there's a decent argument that
the relation extension lock mechanism should be entirely redesigned:
the current system is neither particularly fast nor particularly
elegant, and some of the services that the heavyweight lock manager
provides, such as deadlock detection, are not relevant for relation
extension locks. I'm not sure if we should try to fix that right away
or come up with some special-purpose hack for vacuum, such as having
backends use condition variables to take turns calling
visibilitymap_set().

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#21

Michael Paquier

michael.paquier@gmail.com

over 9 years ago

In reply to: Masahiko Sawada (#20)

Re: Block level parallel vacuum WIP

On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

Marked as returned with feedback because of lack of activity and...
Feedback provided.
--
Michael

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#22

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Michael Paquier (#21)

2 attachment(s)

Re: Block level parallel vacuum WIP

On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

I got some advices at PGConf.ASIA 2016 and started to work on this again.

The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page. I'm not sure this is the
best idea but it's very simple and enough to support parallel vacuum.
More smart idea could be needed when we want to support parallel DML
and so on.

001 patch adds parallel option to VACUUM command. As Robert suggested
before, parallel option is set with parallel degree.

=# VACUUM (PARALLEL 4) table_name;

..means 4 background processes are launched and background process
executes lazy_scan_heap while the launcher (leader) process waits for
all vacuum workers finish. If N = 1 or without parallel option, leader
process itself executes lazy_scan_heap.

Internal Design
=============
I changed the parallel vacuum internal design. Collecting garbage on
table is processed in block-level parallel. For tables with indexes,
each index on table is assigned to each vacuum worker and all garbage
on a index are processed by particular assigned vacuum worker.

The all spaces for the array of dead tuple TIDs used by vacuum worker
are allocated in dynamic shared memory by launcher process. Vacuum
worker process stores dead tuple location into its dead tuple array
without lock, the TIDs in a dead tuple array are ordered by TID. Note
that entire space for dead tuple, that is a bunch of dead tuple array,
are not ordered.

If table has index, all dead tuple TIDs needs to be shared with all
vacuum workers before actual reclaiming dead tuples starts and these
data should be cleared after all vacuum worker finished to use them.
So I put two synchronization points at where before reclaiming dead
tuples and where after finished to reclaim them. At these points,
parallel vacuum worker waits for all other workers to reach to the
same point. Once all vacuum workers reached to same point, vacuum
worker resumes next operation.

For example, If a table has five indexes and we execute parallel lazy
vacuum on that table with three vacuum workers, two of three vacuum
workers are assigned two indexes and another one vacuum worker is
assigned to one indexes. After the amount of dead tuple of all vacuum
worker reached to maintenance_work_mem size vacuum worker starts to
reclaim dead tuple on table and index. A vacuum worker that is
assigned to one index finishes (probably first) and sleeps until other
two vacuum workers finish to vacuum. If table has no index then each
parallel vacuum worker vacuums each page as we go.

Performance
===========
I measured the execution time of vacuum on dirty table with several
parallel degree in my poor environment.

table_size | indexes | parallel_degree | time
------------+---------+-----------------+----------
6.5GB | 0 | 1 | 00:00:14
6.5GB | 0 | 2 | 00:00:02
6.5GB | 0 | 4 | 00:00:02
6.5GB | 1 | 1 | 00:00:13
6.5GB | 1 | 2 | 00:00:15
6.5GB | 1 | 4 | 00:00:18
6.5GB | 2 | 1 | 00:02:18
6.5GB | 2 | 2 | 00:00:38
6.5GB | 2 | 4 | 00:00:46
13GB | 0 | 1 | 00:03:52
13GB | 0 | 2 | 00:00:49
13GB | 0 | 4 | 00:00:50
13GB | 1 | 1 | 00:01:41
13GB | 1 | 2 | 00:01:59
13GB | 1 | 4 | 00:01:24
13GB | 2 | 1 | 00:12:42
13GB | 2 | 2 | 00:01:17
13GB | 2 | 4 | 00:02:12

In result of my measurement, vacuum execution time got better in some
cases but didn't improve in case where index = 1. I'll investigate the
cause.

ToDo
======
* Vacuum progress support.
* Storage parameter support, perhaps parallel_vacuum_workers parameter
which allows autovacuum to use parallel vacuum on specified table.

I register this to next CF.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

000_make_group_locking_conflict_extend_lock_v2.patchapplication/octet-stream; name=000_make_group_locking_conflict_extend_lock_v2.patchDownload

diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index e9703f1..dd27acf 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1354,6 +1354,17 @@ LockCheckConflicts(LockMethod lockMethodTable,
 	}
 
 	/*
+	 * If relation lock for extend, it's a conflict even in
+	 * group locking.
+	 */
+	if ((lock->tag).locktag_type == LOCKTAG_RELATION_EXTEND)
+	{
+		PROCLOCK_PRINT("LockCheckConflicts: conflicting (group)",
+					   proclock);
+		return STATUS_FOUND;
+	}
+
+	/*
 	 * Locks held in conflicting modes by members of our own lock group are
 	 * not real conflicts; we can subtract those out and see if we still have
 	 * a conflict.  This is O(N) in the number of processes holding or

001_parallel_vacuum_v2.patchapplication/octet-stream; name=001_parallel_vacuum_v2.patchDownload

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index f18180a..8f1dc7b 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
+VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | PARALLEL <replaceable class="PARAMETER">N</replaceable> | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable class="PARAMETER">table_name</replaceable> ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 </synopsis>
@@ -130,6 +130,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">
    </varlistentry>
 
    <varlistentry>
+    <term><literal>PARALLEL <replaceable class="PARAMETER">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable class="PARAMETER">N
+      </replaceable> background workers. Collecting garbage on table is processed
+      in block-level parallel. For tables with indexes, parallel vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a index are processed
+      by particular parallel vacuum worker. This option can not use with <literal>FULL</>
+      option.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>DISABLE_PAGE_SKIPPING</literal></term>
     <listitem>
      <para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1ce42ea..ff3f8d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -88,7 +88,6 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
 						bool is_bitmapscan,
 						bool is_samplescan,
 						bool temp_snap);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1670,7 +1669,7 @@ heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
  *		first backend gets an InvalidBlockNumber return.
  * ----------------
  */
-static BlockNumber
+BlockNumber
 heap_parallelscan_nextpage(HeapScanDesc scan)
 {
 	BlockNumber page = InvalidBlockNumber;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0f72c1c..bfcb77a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -71,7 +71,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
 				  MultiXactId lastSaneMinMulti);
-static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
+static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options,
 		   VacuumParams *params);
 
 /*
@@ -86,17 +86,17 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	VacuumParams params;
 
 	/* sanity checks on options */
-	Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
-	Assert((vacstmt->options & VACOPT_VACUUM) ||
-		   !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
-	Assert((vacstmt->options & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
-	Assert(!(vacstmt->options & VACOPT_SKIPTOAST));
+	Assert(vacstmt->options.flags & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert((vacstmt->options.flags & VACOPT_VACUUM) ||
+		   !(vacstmt->options.flags & (VACOPT_FULL | VACOPT_FREEZE)));
+	Assert((vacstmt->options.flags & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
+	Assert(!(vacstmt->options.flags & VACOPT_SKIPTOAST));
 
 	/*
 	 * All freeze ages are zero if the FREEZE option is given; otherwise pass
 	 * them as -1 which means to use the default values.
 	 */
-	if (vacstmt->options & VACOPT_FREEZE)
+	if (vacstmt->options.flags & VACOPT_FREEZE)
 	{
 		params.freeze_min_age = 0;
 		params.freeze_table_age = 0;
@@ -145,7 +145,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
  * memory context that will not disappear at transaction commit.
  */
 void
-vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
+vacuum(VacuumOptions options, RangeVar *relation, Oid relid, VacuumParams *params,
 	   List *va_cols, BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	const char *stmttype;
@@ -156,7 +156,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 
 	Assert(params != NULL);
 
-	stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
+	stmttype = (options.flags & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
 
 	/*
 	 * We cannot run VACUUM inside a user transaction block; if we were inside
@@ -166,7 +166,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 *
 	 * ANALYZE (without VACUUM) can run either way.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 	{
 		PreventTransactionChain(isTopLevel, stmttype);
 		in_outer_xact = false;
@@ -188,17 +188,26 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
-	if ((options & VACOPT_FULL) != 0 &&
-		(options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
 
 	/*
+	 * Sanity check PARALLEL option.
+	 */
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_PARALLEL) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("VACUUM option PARALLEL cannnot be used with FULL")));
+
+	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
 	 */
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 		pgstat_vacuum_stat();
 
 	/*
@@ -244,11 +253,11 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 * transaction block, and also in an autovacuum worker, use own
 	 * transactions so we can release locks sooner.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 		use_own_xacts = true;
 	else
 	{
-		Assert(options & VACOPT_ANALYZE);
+		Assert(options.flags & VACOPT_ANALYZE);
 		if (IsAutoVacuumWorkerProcess())
 			use_own_xacts = true;
 		else if (in_outer_xact)
@@ -298,13 +307,13 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		{
 			Oid			relid = lfirst_oid(cur);
 
-			if (options & VACOPT_VACUUM)
+			if (options.flags & VACOPT_VACUUM)
 			{
 				if (!vacuum_rel(relid, relation, options, params))
 					continue;
 			}
 
-			if (options & VACOPT_ANALYZE)
+			if (options.flags & VACOPT_ANALYZE)
 			{
 				/*
 				 * If using separate xacts, start one for analyze. Otherwise,
@@ -317,7 +326,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 					PushActiveSnapshot(GetTransactionSnapshot());
 				}
 
-				analyze_rel(relid, relation, options, params,
+				analyze_rel(relid, relation, options.flags, params,
 							va_cols, in_outer_xact, vac_strategy);
 
 				if (use_own_xacts)
@@ -353,7 +362,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		StartTransactionCommand();
 	}
 
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 	{
 		/*
 		 * Update pg_database.datfrozenxid, and truncate pg_clog if possible.
@@ -1183,7 +1192,7 @@ vac_truncate_clog(TransactionId frozenXID,
  *		At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options, VacuumParams *params)
 {
 	LOCKMODE	lmode;
 	Relation	onerel;
@@ -1204,7 +1213,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 */
 	PushActiveSnapshot(GetTransactionSnapshot());
 
-	if (!(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_FULL))
 	{
 		/*
 		 * In lazy vacuum, we can set the PROC_IN_VACUUM flag, which lets
@@ -1244,7 +1253,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
 	 * way, we can be sure that no other backend is vacuuming the same table.
 	 */
-	lmode = (options & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+	lmode = (options.flags & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * Open the relation and get the appropriate lock on it.
@@ -1255,7 +1264,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * If we've been asked not to wait for the relation lock, acquire it first
 	 * in non-blocking mode, before calling try_relation_open().
 	 */
-	if (!(options & VACOPT_NOWAIT))
+	if (!(options.flags & VACOPT_NOWAIT))
 		onerel = try_relation_open(relid, lmode);
 	else if (ConditionalLockRelationOid(relid, lmode))
 		onerel = try_relation_open(relid, NoLock);
@@ -1359,7 +1368,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * us to process it.  In VACUUM FULL, though, the toast table is
 	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 	 */
-	if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_SKIPTOAST) && !(options.flags & VACOPT_FULL))
 		toast_relid = onerel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -1378,7 +1387,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (options & VACOPT_FULL)
+	if (options.flags & VACOPT_FULL)
 	{
 		/* close relation before vacuuming, but hold lock until commit */
 		relation_close(onerel, NoLock);
@@ -1386,7 +1395,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, false,
-					(options & VACOPT_VERBOSE) != 0);
+					(options.flags & VACOPT_VERBOSE) != 0);
 	}
 	else
 		lazy_vacuum_rel(onerel, options, params, vac_strategy);
@@ -1440,8 +1449,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
  * hit dangling index pointers.
  */
 void
-vac_open_indexes(Relation relation, LOCKMODE lockmode,
-				 int *nindexes, Relation **Irel)
+vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
 {
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index a2999b3..b5e6eed 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -23,6 +23,22 @@
  * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * In PostgreSQL 10, we support parallel option for lazy vacuum. In parallel
+ * lazy vacuum the multiple vacuum worker processes get page number in parallel
+ * using parallel heap scan and process it.  The memory spaces for the array
+ * of dead tuple TIDs of each worker are allocated in dynamic shared memory in
+ * advance by launcher process.  The vacuum workers has its vacuum state and
+ * round.  Since the vacuum state is the cyclical state the round value indicates
+ * how many laps vacuum worker did so far.  Vacuum worker increments its
+ * round after finished the reclaim phase.  For tables with indexes, each index
+ * on table is assigned to each vacuum worker.  That is, the number of indexes
+ * assigned could be different between vacuum workers.  Because the dead tuple
+ * TIDs need to be shared with all vacuum workers in order to reclaim index
+ * garbage and be cleared after all vacuum workers finished the reclaim phase,
+ * vacuum worker synchronizes other vacuum workers at two points where before
+ * reclaim phase begins and where after reclaim phase finished.  After all of
+ * vacuum workers finished to work, the launcher process gathers the lazy vacuum
+ * statistics and update them.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -42,8 +58,11 @@
 #include "access/heapam_xlog.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/parallel.h"
+#include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -55,6 +74,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "utils/lsyscache.h"
@@ -98,10 +118,72 @@
  */
 #define SKIP_PAGES_THRESHOLD	((BlockNumber) 32)
 
+/* DSM key for parallel lazy vacuum */
+#define VACUUM_KEY_PARALLEL_SCAN	50
+#define VACUUM_KEY_VACUUM_STATS		51
+#define VACUUM_KEY_INDEX_STATS		52
+#define VACUUM_KEY_DEAD_TUPLES		53
+#define VACUUM_KEY_VACUUM_TASK		54
+#define VACUUM_KEY_PARALLEL_STATE	55
+
+/*
+ * see note of lazy_scan_heap_get_nextpage about forcing scanning of
+ * last page
+ */
+#define FORCE_CHECK_PAGE(blk) \
+	(blkno == (blk - 1) && should_attempt_truncation(vacrelstats))
+
+/* Check if given index is assigned to this parallel vacuum worker */
+#define IsAssignedIndex(i_num, nworkers) \
+	(!IsParallelWorker() ||\
+	 (IsParallelWorker() && ((i_num) % (nworkers) == ParallelWorkerNumber)))
+
+/* Data structure for updating index relation statistics */
+typedef struct LVIndStats
+{
+	bool		do_update;	/* Launcher process will update? */
+	BlockNumber	rel_pages;	/* # of index pages */
+	BlockNumber rel_tuples;	/* # of index tuples */
+} LVIndStats;
+
+/* Vacuum worker state for parallel lazy vacuum */
+#define VACSTATE_STARTUP			0x01	/* startup state */
+#define VACSTATE_SCANNING			0x02	/* heap scan phase */
+#define VACSTATE_VACUUM_PREPARED	0x04	/* finished to scan heap */
+#define VACSTATE_VACUUMING			0x08	/* vacuuming on table and index */
+#define VACSTATE_VACUUM_FINISHED	0x10	/* finished to vacuum */
+#define VACSTATE_COMPLETE			0x20	/* complete to vacuum */
+
+/* Vacuum phase for parallel lazy vacuum */
+#define VACPHASE_SCAN\
+	(VACSTATE_SCANNING | VACSTATE_VACUUM_PREPARED)
+#define VACPHASE_VACUUM \
+	(VACSTATE_VACUUM_PREPARED | VACSTATE_VACUUMING | VACSTATE_VACUUM_FINISHED)
+
+typedef struct VacWorker
+{
+	uint8 state;	/* current state of vacuum worker */
+	uint32 round;	/* current laps */
+} VacWorker;
+
+typedef struct LVParallelState
+{
+	int nworkers;			/* # of parallel vacuum workers */
+	ConditionVariable cv;	/* condition variable for making synchronization points*/
+	slock_t	mutex;			/* mutex for vacworkers state */
+	VacWorker vacworkers[FLEXIBLE_ARRAY_MEMBER];
+	/* each vacuum workers state follows */
+} LVParallelState;
+
+typedef struct LVDeadTuple
+{
+	int n_dt; 				/* # of dead tuple */
+	ItemPointer dt_array;	/* NB: Each list is ordered by TID address */
+} LVDeadTuple;
+
 typedef struct LVRelStats
 {
-	/* hasindex = true means two-pass strategy; false means one-pass */
-	bool		hasindex;
+	int			nindexes; /* > 0 means two-pass strategy; = 0 means one-pass */
 	/* Overall statistics about rel */
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
@@ -116,15 +198,42 @@ typedef struct LVRelStats
 	double		tuples_deleted;
 	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
+	LVDeadTuple *dead_tuples;
 	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
 	int			num_index_scans;
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
+	/* Fields for parallel lazy vacuum */
+	LVIndStats	*vacindstats;
+	LVParallelState *pstate;
 } LVRelStats;
 
+/*
+ * Scan description data for lazy vacuum. In parallel lazy vacuum,
+ * we use only heapscan instead.
+ */
+typedef struct LVScanDescData
+{
+	BlockNumber lv_cblock;					/* current scanning block number */
+	BlockNumber lv_next_unskippable_block;	/* next block number we cannot skip */
+	BlockNumber lv_nblocks;					/* the number blocks of relation */
+	HeapScanDesc heapscan;					/* field for parallel lazy vacuum */
+} LVScanDescData;
+typedef struct LVScanDescData *LVScanDesc;
+
+/*
+ * Vacuum relevant options and thresholds we need share with parallel
+ * vacuum workers.
+ */
+typedef struct VacuumTask
+{
+	int				options;	/* VACUUM optoins */
+	bool			aggressive;	/* does each worker need to aggressive vacuum? */
+	TransactionId	oldestxmin;
+	TransactionId	freezelimit;
+	MultiXactId		multixactcutoff;
+	int				elevel;
+} VacuumTask;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -135,11 +244,10 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
+static LVDeadTuple *MyDeadTuple = NULL; /* pointer to my dead tuple space */
+static VacWorker *MyVacWorker = NULL;	/* pointer to my vacuum worker state */
 
 /* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
-			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
-			   bool aggressive);
 static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
@@ -147,21 +255,54 @@ static void lazy_vacuum_index(Relation indrel,
 				  LVRelStats *vacrelstats);
 static void lazy_cleanup_index(Relation indrel,
 				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats);
+				   LVRelStats *vacrelstats,
+				   LVIndStats *vacindstats);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+							int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
 						 LVRelStats *vacrelstats);
 static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
-static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr);
+static void lazy_record_dead_tuple(LVRelStats *vacrelstats, ItemPointer itemptr);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 					 TransactionId *visibility_cutoff_xid, bool *all_frozen);
-
+static void lazy_scan_heap(LVRelStats *vacrelstats, Relation onerel,
+						   Relation *Irels, int nindexes,ParallelHeapScanDesc pscan,
+						   int options, bool aggressive);
+
+/* function prototypes for parallel vacuum */
+static void parallel_lazy_scan_heap(Relation rel, LVRelStats *vacrelstats,
+									int options, bool aggressive, int wnum);
+static void lazy_vacuum_worker(dsm_segment *seg, shm_toc *toc);
+static void lazy_gather_vacuum_stats(ParallelContext *pxct,
+									 LVRelStats *valrelstats,
+									 LVIndStats *vacindstats);
+static void lazy_update_index_stats(Relation onerel, LVIndStats *vacindstats);
+static void lazy_estimate_dsm(ParallelContext *pcxt, long maxtuples, int nindexes);
+static void lazy_initialize_dsm(ParallelContext *pcxt, Relation onrel,
+								LVRelStats *vacrelstats, int options,
+								bool aggressive);
+static void lazy_initialize_worker(shm_toc *toc, ParallelHeapScanDesc *pscan,
+								   LVRelStats **vacrelstats, int *options,
+								   bool *aggressive);
+static void lazy_clear_dead_tuple(LVRelStats *vacrelstats);
+static LVScanDesc lv_beginscan(LVRelStats *vacrelstats, ParallelHeapScanDesc pscan,
+							   Relation onerel);
+static void lv_endscan(LVScanDesc lvscan);
+static BlockNumber lazy_scan_heap_get_nextpage(Relation onerel, LVRelStats* vacrelstats,
+											   LVScanDesc lvscan, bool *all_visible_according_to_vm,
+											   Buffer *vmbuffer, int options, bool aggressive);
+static void lazy_set_vacstate_and_wait_prepared(LVParallelState *pstate);
+static void lazy_set_vacstate_and_wait_finished(LVRelStats *vacrelstats);
+static bool lazy_check_vacstate_prepared(LVParallelState *pstate, uint32 round);
+static bool lazy_check_vacstate_finished(LVParallelState *pstate, uint32 round);
+static int  lazy_count_vacstate_finished(LVParallelState *pstate, uint32 round, int *n_complete);
+static uint32 lazy_set_my_vacstate(LVParallelState *pstate, uint8 state, bool nextloop,
+								 bool broadcast);
+static long lazy_get_max_dead_tuple(LVRelStats *vacrelstats);
 
 /*
  *	lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
@@ -173,7 +314,7 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf,
  *		and locked the relation.
  */
 void
-lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+lazy_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
 				BufferAccessStrategy bstrategy)
 {
 	LVRelStats *vacrelstats;
@@ -205,7 +346,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		starttime = GetCurrentTimestamp();
 	}
 
-	if (options & VACOPT_VERBOSE)
+	if (options.flags & VACOPT_VERBOSE)
 		elevel = INFO;
 	else
 		elevel = DEBUG2;
@@ -233,7 +374,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 											   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 											  mxactFullScanLimit);
-	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+	if (options.flags & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
@@ -244,15 +385,26 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
-	/* Open all indexes of the relation */
-	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
-	vacrelstats->hasindex = (nindexes > 0);
+	if (options.nworkers > 1)
+	{
+		vacrelstats->nindexes = list_length(RelationGetIndexList(onerel));
 
-	/* Do the vacuuming */
-	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
+		/* Do the vacuuming in parallel */
+		parallel_lazy_scan_heap(onerel, vacrelstats, options.flags, aggressive,
+								options.nworkers);
+	}
+	else
+	{
+		vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
+		vacrelstats->nindexes = nindexes;
 
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
+		/* Do the vacuuming */
+		lazy_scan_heap(vacrelstats, onerel, Irel, nindexes, NULL,
+					   options.flags, aggressive);
+
+		/* Done with indexes */
+		vac_close_indexes(nindexes, Irel, RowExclusiveLock);
+	}
 
 	/*
 	 * Compute whether we actually scanned the all unfrozen pages. If we did,
@@ -319,7 +471,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						new_rel_pages,
 						new_rel_tuples,
 						new_rel_allvisible,
-						vacrelstats->hasindex,
+						(vacrelstats->nindexes != 0),
 						new_frozen_xid,
 						new_min_multi,
 						false);
@@ -428,28 +580,121 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
+ * Launch parallel vacuum workers specified by wnum and wait for all vacuum
+ * workers finish. Before launch vacuum worker we initialize dynamic shared memory
+ * and stores relevant data to it. After all workers finished we gather the vacuum
+ * statistics of all vacuum workers.
+ */
+static void
+parallel_lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
+						int options, bool aggressive, int wnum)
+{
+	ParallelContext	*pcxt;
+	long maxtuples;
+	LVIndStats *vacindstats;
+
+	vacindstats = (LVIndStats *) palloc(sizeof(LVIndStats) * vacrelstats->nindexes);
+
+	EnterParallelMode();
+
+	/* Create parallel context and initialize it */
+	pcxt = CreateParallelContext(lazy_vacuum_worker, wnum);
+
+	/* Estimate DSM size for parallel vacuum */
+	maxtuples = (int) lazy_get_max_dead_tuple(vacrelstats);
+	vacrelstats->max_dead_tuples = maxtuples;
+	lazy_estimate_dsm(pcxt, maxtuples, vacrelstats->nindexes);
+
+	/* Initialize DSM for parallel vacuum */
+	InitializeParallelDSM(pcxt);
+	lazy_initialize_dsm(pcxt, onerel, vacrelstats, options, aggressive);
+
+	/* Launch workers */
+	LaunchParallelWorkers(pcxt);
+
+	/* Wait for workers finished vacuum */
+	WaitForParallelWorkersToFinish(pcxt);
+
+	/* Gather the result of vacuum statistics from all workers */
+	lazy_gather_vacuum_stats(pcxt, vacrelstats, vacindstats);
+
+	/* Now we can compute the new value for pg_class.reltuples */
+	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+														 vacrelstats->rel_pages,
+														 vacrelstats->scanned_pages,
+														 vacrelstats->scanned_tuples);
+	DestroyParallelContext(pcxt);
+	ExitParallelMode();
+
+	/* After parallel mode, we can update index statistics */
+	lazy_update_index_stats(onerel, vacindstats);
+}
+
+/*
+ * Entry point of parallel vacuum worker.
+ */
+static void
+lazy_vacuum_worker(dsm_segment *seg, shm_toc *toc)
+{
+	ParallelHeapScanDesc pscan;
+	LVRelStats *vacrelstats;
+	int options;
+	bool aggressive;
+	Relation rel;
+	Relation *indrel;
+	int nindexes_worker;
+
+	/* Look up and initialize information and task */
+	lazy_initialize_worker(toc, &pscan, &vacrelstats, &options,
+							   &aggressive);
+
+	rel = relation_open(pscan->phs_relid, ShareUpdateExclusiveLock);
+
+	/* Open all indexes */
+	vac_open_indexes(rel, RowExclusiveLock, &nindexes_worker,
+					 &indrel);
+
+	/* Do lazy vacuum */
+	lazy_scan_heap(vacrelstats, rel, indrel, vacrelstats->nindexes,
+				   pscan, options, aggressive);
+
+	heap_close(rel, ShareUpdateExclusiveLock);
+	vac_close_indexes(vacrelstats->nindexes, indrel, RowExclusiveLock);
+}
+
+/*
  *	lazy_scan_heap() -- scan an open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
  *		things truncate dead tuples to dead line pointers, defragment the
- *		page, and set commit status bits (see heap_page_prune).  It also builds
+ *		page, and set commit status bits (see heap_page_prune).  It also uses
  *		lists of dead tuples and pages with free space, calculates statistics
  *		on the number of live tuples in the heap, and marks pages as
  *		all-visible if appropriate.  When done, or when we run low on space for
- *		dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
- *		to reclaim dead line pointers.
+ *		dead-tuple TIDs, invoke vacuuming of assigned indexes and call lazy_vacuum_heap
+ *		to reclaim dead line pointers. In parallel vacuum, we need to synchronize
+ *		at where scanning heap finished and vacuuming heap finished. The vacuum
+ *		worker reached to that point first need to wait for other vacuum workers
+ *		reached to the same point.
+ *
+ *		In parallel lazy scan, pscan is not NULL and we get next page number
+ *		using parallel heap scan. We make two synchronization points at where
+ *		before reclaiming dead tuple actually and after reclaimed them.
  *
  *		If there are no indexes then we can reclaim line pointers on the fly;
  *		dead line pointers need only be retained until all index pointers that
  *		reference them have been killed.
  */
 static void
-lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive)
+lazy_scan_heap(LVRelStats *vacrelstats, Relation onerel, Relation *Irel,
+			   int nindexes, ParallelHeapScanDesc pscan, int options,
+			   bool aggressive)
 {
-	BlockNumber nblocks,
-				blkno;
+	BlockNumber blkno;
+	BlockNumber nblocks;
 	HeapTupleData tuple;
+	LVScanDesc lvscan;
+	LVIndStats *vacindstats;
 	char	   *relname;
 	BlockNumber empty_pages,
 				vacuumed_pages;
@@ -461,10 +706,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	int			i;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
-	BlockNumber next_unskippable_block;
-	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
+	bool		all_visible_according_to_vm = false;
+
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -482,11 +727,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	empty_pages = vacuumed_pages = 0;
 	num_tuples = tups_vacuumed = nkeep = nunused = 0;
+	nblocks = RelationGetNumberOfBlocks(onerel);
 
 	indstats = (IndexBulkDeleteResult **)
-		palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
+		palloc0(sizeof(IndexBulkDeleteResult *) * nindexes);
 
-	nblocks = RelationGetNumberOfBlocks(onerel);
 	vacrelstats->rel_pages = nblocks;
 	vacrelstats->scanned_pages = 0;
 	vacrelstats->nonempty_pages = 0;
@@ -495,86 +740,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	lazy_space_alloc(vacrelstats, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
+	/* Array of index vacuum statistics */
+	vacindstats = vacrelstats->vacindstats;
+
+	/* Begin heap scan for vacuum */
+	lvscan = lv_beginscan(vacrelstats, pscan, onerel);
+
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
 	initprog_val[2] = vacrelstats->max_dead_tuples;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/*
-	 * Except when aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-	 * sequentially, the OS should be doing readahead for us, so there's no
-	 * gain in skipping a page now and then; that's likely to disable
-	 * readahead and so be counterproductive. Also, skipping even a single
-	 * page means that we can't update relfrozenxid, so we only want to do it
-	 * if we can skip a goodly number of pages.
-	 *
-	 * When aggressive is set, we can't skip pages just because they are
-	 * all-visible, but we can still skip pages that are all-frozen, since
-	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
-	 *
-	 * Before entering the main loop, establish the invariant that
-	 * next_unskippable_block is the next block number >= blkno that's not we
-	 * can't skip based on the visibility map, either all-visible for a
-	 * regular scan or all-frozen for an aggressive scan.  We set it to
-	 * nblocks if there's no such block.  We also set up the skipping_blocks
-	 * flag correctly at this stage.
-	 *
-	 * Note: The value returned by visibilitymap_get_status could be slightly
-	 * out-of-date, since we make this test before reading the corresponding
-	 * heap page or locking the buffer.  This is OK.  If we mistakenly think
-	 * that the page is all-visible or all-frozen when in fact the flag's just
-	 * been cleared, we might fail to vacuum the page.  It's easy to see that
-	 * skipping a page when aggressive is not set is not a very big deal; we
-	 * might leave some dead tuples lying around, but the next vacuum will
-	 * find them.  But even when aggressive *is* set, it's still OK if we miss
-	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
-	 * just added to that page are necessarily newer than the GlobalXmin we
-	 * computed, so they'll have no effect on the value to which we can safely
-	 * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
-	 *
-	 * We will scan the table's last page, at least to the extent of
-	 * determining whether it has tuples or not, even if it should be skipped
-	 * according to the above rules; except when we've already determined that
-	 * it's not worth trying to truncate the table.  This avoids having
-	 * lazy_truncate_heap() take access-exclusive lock on the table to attempt
-	 * a truncation that just fails immediately because there are tuples in
-	 * the last page.  This is worth avoiding mainly because such a lock must
-	 * be replayed on any hot standby, where it can be disruptive.
-	 */
-	next_unskippable_block = 0;
-	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-	{
-		while (next_unskippable_block < nblocks)
-		{
-			uint8		vmstatus;
+	lazy_set_my_vacstate(vacrelstats->pstate, VACSTATE_SCANNING, false, false);
 
-			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-												&vmbuffer);
-			if (aggressive)
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-					break;
-			}
-			else
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-					break;
-			}
-			vacuum_delay_point();
-			next_unskippable_block++;
-		}
-	}
-
-	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
-		skipping_blocks = true;
-	else
-		skipping_blocks = false;
-
-	for (blkno = 0; blkno < nblocks; blkno++)
+	while((blkno = lazy_scan_heap_get_nextpage(onerel, vacrelstats, lvscan,
+											   &all_visible_according_to_vm,
+											   &vmbuffer, options, aggressive)) !=
+		  InvalidBlockNumber)
 	{
 		Buffer		buf;
 		Page		page;
@@ -585,100 +768,21 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		int			prev_dead_count;
 		int			nfrozen;
 		Size		freespace;
-		bool		all_visible_according_to_vm = false;
 		bool		all_visible;
 		bool		all_frozen = true;	/* provided all_visible is also true */
 		bool		has_dead_tuples;
 		TransactionId visibility_cutoff_xid = InvalidTransactionId;
 
-		/* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
-		(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
-
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
-		if (blkno == next_unskippable_block)
-		{
-			/* Time to advance next_unskippable_block */
-			next_unskippable_block++;
-			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-			{
-				while (next_unskippable_block < nblocks)
-				{
-					uint8		vmskipflags;
-
-					vmskipflags = visibilitymap_get_status(onerel,
-													  next_unskippable_block,
-														   &vmbuffer);
-					if (aggressive)
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-							break;
-					}
-					else
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-							break;
-					}
-					vacuum_delay_point();
-					next_unskippable_block++;
-				}
-			}
-
-			/*
-			 * We know we can't skip the current block.  But set up
-			 * skipping_all_visible_blocks to do the right thing at the
-			 * following blocks.
-			 */
-			if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
-				skipping_blocks = true;
-			else
-				skipping_blocks = false;
-
-			/*
-			 * Normally, the fact that we can't skip this block must mean that
-			 * it's not all-visible.  But in an aggressive vacuum we know only
-			 * that it's not all-frozen, so it might still be all-visible.
-			 */
-			if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
-				all_visible_according_to_vm = true;
-		}
-		else
-		{
-			/*
-			 * The current block is potentially skippable; if we've seen a
-			 * long enough run of skippable blocks to justify skipping it, and
-			 * we're not forced to check it, then go ahead and skip.
-			 * Otherwise, the page must be at least all-visible if not
-			 * all-frozen, so we can set all_visible_according_to_vm = true.
-			 */
-			if (skipping_blocks && !FORCE_CHECK_PAGE())
-			{
-				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was all-frozen, so we have to recheck; but
-				 * in this case an approximate answer is OK.
-				 */
-				if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
-					vacrelstats->frozenskipped_pages++;
-				continue;
-			}
-			all_visible_according_to_vm = true;
-		}
-
 		vacuum_delay_point();
 
 		/*
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if ((vacrelstats->max_dead_tuples - MyDeadTuple->n_dt) < MaxHeapTuplesPerPage &&
+			MyDeadTuple->n_dt > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -687,6 +791,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			int64		hvp_val[2];
 
 			/*
+			 * Here scanning heap is done and we are going to reclaim dead
+			 * tuples actually. Because other vacuum worker could not finished
+			 * yet, we wait for all other workers finish.
+			 */
+			lazy_set_vacstate_and_wait_prepared(vacrelstats->pstate);
+
+			/*
 			 * Before beginning index vacuuming, we release any pin we may
 			 * hold on the visibility map page.  This isn't necessary for
 			 * correctness, but we do it anyway to avoid holding the pin
@@ -705,11 +816,12 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
 
-			/* Remove index entries */
+			/* Remove assigned index entries */
 			for (i = 0; i < nindexes; i++)
-				lazy_vacuum_index(Irel[i],
-								  &indstats[i],
-								  vacrelstats);
+			{
+				if (IsAssignedIndex(i, vacrelstats->pstate->nworkers))
+					lazy_vacuum_index(Irel[i], &indstats[i], vacrelstats);
+			}
 
 			/*
 			 * Report that we are now vacuuming the heap.  We also increase
@@ -724,17 +836,28 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			/* Remove tuples from heap */
 			lazy_vacuum_heap(onerel, vacrelstats);
 
+
+			/* Report that we are once again scanning the heap */
+			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
 			/*
 			 * Forget the now-vacuumed tuples, and press on, but be careful
 			 * not to reset latestRemovedXid since we want that value to be
-			 * valid.
+			 * valid. In parallel lazy vacuum, we do that later process.
 			 */
-			vacrelstats->num_dead_tuples = 0;
-			vacrelstats->num_index_scans++;
+			if (vacrelstats->pstate == NULL)
+				lazy_clear_dead_tuple(vacrelstats);
 
-			/* Report that we are once again scanning the heap */
-			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
-										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+			/*
+			 * Here we've done vacuum on the heap and index and we are going
+			 * to begin the next round scan on heap. Wait until all vacuum worker
+			 * finished vacuum. After all vacuum workers finished, all of dead
+			 * tuple arrays are cleared by a process.
+			 */
+			lazy_set_vacstate_and_wait_finished(vacrelstats);
+
+			vacrelstats->num_index_scans++;
 		}
 
 		/*
@@ -760,7 +883,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * it's OK to skip vacuuming pages we get a lock conflict on. They
 			 * will be dealt with in some future vacuum.
 			 */
-			if (!aggressive && !FORCE_CHECK_PAGE())
+			if (!aggressive && !FORCE_CHECK_PAGE(blkno))
 			{
 				ReleaseBuffer(buf);
 				vacrelstats->pinskipped_pages++;
@@ -911,7 +1034,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = MyDeadTuple->n_dt;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -1120,10 +1243,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/*
 		 * If there are no indexes then we can vacuum the page right now
-		 * instead of doing a second scan.
+		 * instead of doing a second scan. Because each parallel worker uses its
+		 * own dead tuple area they can vacuum independently.
 		 */
-		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+		if (Irel == NULL && MyDeadTuple->n_dt > 0)
 		{
 			/* Remove tuples from heap */
 			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
@@ -1134,7 +1257,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lazy_clear_dead_tuple(vacrelstats);
 			vacuumed_pages++;
 		}
 
@@ -1237,7 +1360,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (MyDeadTuple->n_dt == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
@@ -1252,10 +1375,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_dead_tuples = nkeep;
 
 	/* now we can compute the new value for pg_class.reltuples */
-	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
-														 nblocks,
-												  vacrelstats->scanned_pages,
-														 num_tuples);
+	if (vacrelstats->pstate == NULL)
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 nblocks,
+															 vacrelstats->scanned_pages,
+															 num_tuples);
 
 	/*
 	 * Release any remaining pin on visibility map page.
@@ -1268,7 +1392,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (MyDeadTuple->n_dt > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
@@ -1276,6 +1400,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		};
 		int64		hvp_val[2];
 
+		/*
+		 * Here, scanning heap is done and going to reclaim dead tuples
+		 * actually. Because other vacuum worker might not finished yet,
+		 * we need to wait for other workers finish.
+		 */
+		lazy_set_vacstate_and_wait_prepared(vacrelstats->pstate);
+
 		/* Log cleanup info before we touch indexes */
 		vacuum_log_cleanup_info(onerel, vacrelstats);
 
@@ -1285,9 +1416,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/* Remove index entries */
 		for (i = 0; i < nindexes; i++)
-			lazy_vacuum_index(Irel[i],
-							  &indstats[i],
-							  vacrelstats);
+		{
+			if (IsAssignedIndex(i, vacrelstats->pstate->nworkers))
+				lazy_vacuum_index(Irel[i], &indstats[i], vacrelstats);
+		}
 
 		/* Report that we are now vacuuming the heap */
 		hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
@@ -1297,10 +1429,22 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		/* Remove tuples from heap */
 		pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 									 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
+
 		lazy_vacuum_heap(onerel, vacrelstats);
+
+		/*
+		 * Here, we've done to vacuum on heap and going to begin the next
+		 * scan on heap. Wait until all vacuum workers finish vacuum.
+		 * Once all vacuum workers finished, one of the vacuum worker clears
+		 * dead tuple array.
+		 */
+		lazy_set_vacstate_and_wait_finished(vacrelstats);
 		vacrelstats->num_index_scans++;
 	}
 
+	/* Change my vacstate to Complete */
+	lazy_set_my_vacstate(vacrelstats->pstate, VACSTATE_COMPLETE, false, true);
+
 	/* report all blocks vacuumed; and that we're cleaning up */
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
@@ -1308,7 +1452,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
-		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+	{
+		if (IsAssignedIndex(i, vacrelstats->pstate->nworkers))
+			lazy_cleanup_index(Irel[i], indstats[i], vacrelstats, &vacindstats[i]);
+	}
 
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
@@ -1317,6 +1464,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	lv_endscan(lvscan);
+
 	/*
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
@@ -1347,6 +1496,81 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pfree(buf.data);
 }
 
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+lazy_gather_vacuum_stats(ParallelContext *pcxt, LVRelStats *vacrelstats,
+									LVIndStats *vacindstats)
+{
+	int	i;
+	LVRelStats *lvstats_list;
+	LVIndStats *lvindstats_list;
+
+	lvstats_list = (LVRelStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_VACUUM_STATS);
+	lvindstats_list = (LVIndStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_INDEX_STATS);
+
+	/* Gather each worker stats */
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		LVRelStats *wstats = lvstats_list + sizeof(LVRelStats) * i;
+
+		vacrelstats->scanned_pages += wstats->scanned_pages;
+		vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+		vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+		vacrelstats->scanned_tuples += wstats->scanned_tuples;
+		vacrelstats->new_dead_tuples += wstats->new_dead_tuples;
+		vacrelstats->pages_removed += wstats->pages_removed;
+		vacrelstats->tuples_deleted += wstats->tuples_deleted;
+		vacrelstats->nonempty_pages += wstats->nonempty_pages;
+	}
+
+	/* all vacuum workers have same value of rel_pages */
+	vacrelstats->rel_pages = lvstats_list->rel_pages;
+
+	/* Copy index vacuum statistics on DSM to local memory */
+	memcpy(vacindstats, lvindstats_list, sizeof(LVIndStats) * vacrelstats->nindexes);
+}
+
+/*
+ * lazy_update_index_stats() -- Update index vacuum statistics
+ *
+ * This routine can not be called in parallel context.
+ */
+static void
+lazy_update_index_stats(Relation onerel, LVIndStats *vacindstats)
+{
+	List *indexoidlist;
+	ListCell *indexoidscan;
+	int i;
+
+	indexoidlist = RelationGetIndexList(onerel);
+	i = 0;
+
+	foreach(indexoidscan, indexoidlist)
+	{
+		Oid indexoid = lfirst_oid(indexoidscan);
+		Relation indrel;
+
+		/* Update index relation statistics if needed */
+		if (vacindstats[i].do_update)
+		{
+			indrel = index_open(indexoid, RowExclusiveLock);
+			vac_update_relstats(indrel,
+								vacindstats[i].rel_pages,
+								vacindstats[i].rel_tuples,
+								0,
+								false,
+								InvalidTransactionId,
+								InvalidMultiXactId,
+								false);
+			index_close(indrel, RowExclusiveLock);
+		}
+		i++;
+	}
+
+	list_free(indexoidlist);
+}
 
 /*
  *	lazy_vacuum_heap() -- second pass over the heap
@@ -1371,7 +1595,8 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 	npages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+
+	while (tupindex < MyDeadTuple->n_dt)
 	{
 		BlockNumber tblk;
 		Buffer		buf;
@@ -1380,7 +1605,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 
 		vacuum_delay_point();
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&MyDeadTuple->dt_array[tupindex]);
 		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
 								 vac_strategy);
 		if (!ConditionalLockBufferForCleanup(buf))
@@ -1421,7 +1646,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
+ * tupindex is the index in MyDeadTuple->dt_array of the first dead
  * tuple for this page.  We assume the rest follow sequentially.
  * The return value is the first tupindex after the tuples of this page.
  */
@@ -1439,16 +1664,16 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < MyDeadTuple->n_dt; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&MyDeadTuple->dt_array[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&MyDeadTuple->dt_array[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1573,7 +1798,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
  *	lazy_vacuum_index() -- vacuum one index relation.
  *
  *		Delete all the index entries pointing to tuples listed in
- *		vacrelstats->dead_tuples, and update running statistics.
+ *		MyDeadTuple->dt_array, and update running statistics.
  */
 static void
 lazy_vacuum_index(Relation indrel,
@@ -1582,6 +1807,7 @@ lazy_vacuum_index(Relation indrel,
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
+	double total_n_dead_tuples = 0;
 
 	pg_rusage_init(&ru0);
 
@@ -1596,10 +1822,25 @@ lazy_vacuum_index(Relation indrel,
 	*stats = index_bulk_delete(&ivinfo, *stats,
 							   lazy_tid_reaped, (void *) vacrelstats);
 
+	/* Count total number of scanned tuples during index vacuum */
+	if (vacrelstats->pstate == NULL)
+		total_n_dead_tuples = MyDeadTuple->n_dt;
+	else
+	{
+		int i;
+
+		/*
+		 * Since there is no vacuum worker who updates dead tuple during
+		 * reclaim phase. We can read them without holding lock.
+		 */
+		for (i = 0; i < vacrelstats->pstate->nworkers; i++)
+			total_n_dead_tuples += (vacrelstats->dead_tuples[i]).n_dt;
+	}
+
 	ereport(elevel,
-			(errmsg("scanned index \"%s\" to remove %d row versions",
+			(errmsg("scanned index \"%s\" to remove %0.f row versions",
 					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+					total_n_dead_tuples),
 			 errdetail("%s.", pg_rusage_show(&ru0))));
 }
 
@@ -1609,7 +1850,8 @@ lazy_vacuum_index(Relation indrel,
 static void
 lazy_cleanup_index(Relation indrel,
 				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats)
+				   LVRelStats *vacrelstats,
+				   LVIndStats *vacindstats)
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
@@ -1630,17 +1872,31 @@ lazy_cleanup_index(Relation indrel,
 
 	/*
 	 * Now update statistics in pg_class, but only if the index says the count
-	 * is accurate.
+	 * is accurate. In parallel lazy vacuum, the worker can not update these
+	 * information by itself, so save to DSM and then the launcher process
+	 * updates it later.
 	 */
 	if (!stats->estimated_count)
-		vac_update_relstats(indrel,
-							stats->num_pages,
-							stats->num_index_tuples,
-							0,
-							false,
-							InvalidTransactionId,
-							InvalidMultiXactId,
-							false);
+	{
+		if (IsParallelWorker())
+		{
+			/* Save to shared memory */
+			vacindstats->do_update = true;
+			vacindstats->rel_pages = stats->num_pages;
+			vacindstats->rel_tuples = stats->num_index_tuples;
+		}
+		else
+		{
+			vac_update_relstats(indrel,
+								stats->num_pages,
+								stats->num_index_tuples,
+								0,
+								false,
+								InvalidTransactionId,
+								InvalidMultiXactId,
+								false);
+		}
+	}
 
 	ereport(elevel,
 			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
@@ -1938,62 +2194,102 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 /*
  * lazy_space_alloc - space allocation decisions for lazy vacuum
  *
+ * In parallel lazy vacuum the space for dead tuple locations are already
+ * allocated in dynamic shared memory, so we allocate space for dead tuple
+ * locations in local memory only when in not parallel lazy vacuum and set
+ * MyDeadTuple.
+ *
  * See the comments at the head of this file for rationale.
  */
 static void
 lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
 {
-	long		maxtuples;
-	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
-	autovacuum_work_mem != -1 ?
-	autovacuum_work_mem : maintenance_work_mem;
-
-	if (vacrelstats->hasindex)
+	/*
+	 * If in not parallel lazy vacuum, we need to allocate dead
+	 * tuple array in local memory.
+	 */
+	if (vacrelstats->pstate == NULL)
 	{
-		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
-		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
-
-		/* curious coding here to ensure the multiplication can't overflow */
-		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
-			maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+		long maxtuples = lazy_get_max_dead_tuple(vacrelstats);
 
-		/* stay sane if small maintenance_work_mem */
-		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+		vacrelstats->dead_tuples = (LVDeadTuple *) palloc(sizeof(LVDeadTuple));
+		MyDeadTuple = vacrelstats->dead_tuples;
+		MyDeadTuple->dt_array = palloc0(sizeof(ItemPointerData) * (int)maxtuples);
+		vacrelstats->max_dead_tuples = maxtuples;
 	}
 	else
 	{
-		maxtuples = MaxHeapTuplesPerPage;
+		/*
+		 * In parallel lazy vacuum, we initialize the dead tuple array.
+		 * LVDeadTuple array is structed at the beginning of dead_tuples variable,
+		 * so remaining space can be used for dead tuple array. The dt_base variable
+		 * points to the beginning of dead tuple array.
+		 */
+
+		char *dt_base = (char *)vacrelstats->dead_tuples;
+		LVDeadTuple *dt = &(vacrelstats->dead_tuples[ParallelWorkerNumber]);
+
+		/* Adjust dt_base to the beginning of dead tuple array */
+		dt_base += sizeof(LVDeadTuple) * vacrelstats->pstate->nworkers;
+		dt->dt_array = (ItemPointer)
+			(dt_base + sizeof(ItemPointerData) * vacrelstats->max_dead_tuples * ParallelWorkerNumber);
+
+		/* set MyDeadTuple */
+		MyDeadTuple = dt;
 	}
 
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	MyDeadTuple->n_dt = 0;
 }
 
 /*
  * lazy_record_dead_tuple - remember one deletable tuple
  */
 static void
-lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr)
+lazy_record_dead_tuple(LVRelStats *vacrelstats, ItemPointer itemptr)
 {
 	/*
 	 * The array shouldn't overflow under normal behavior, but perhaps it
 	 * could if we are given a really small maintenance_work_mem. In that
 	 * case, just forget the last few tuples (we'll get 'em next time).
 	 */
-	if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
+	if (MyDeadTuple->n_dt < vacrelstats->max_dead_tuples)
 	{
-		vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-		vacrelstats->num_dead_tuples++;
-		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 vacrelstats->num_dead_tuples);
+		/*
+		 * In parallel lzy vacuum, since each parallel vacuum worker has
+		 * its own dead tuple array we don't need to do this exclusively.
+		 */
+		MyDeadTuple->dt_array[MyDeadTuple->n_dt] = *itemptr;
+		(MyDeadTuple->n_dt)++;
+
+		/* XXX : Update progress information here */
 	}
 }
 
 /*
+ *  lazy_clear_dead_tuple() -- clear dead tuple list
+ */
+static void
+lazy_clear_dead_tuple(LVRelStats *vacrelstats)
+{
+	/*
+	 * In parallel lazy vacuum one of the parallel worker is responsible
+	 * for clearing all dead tuples. Note that we're assumed that only
+	 * one process touches the dead tuple array.
+	 */
+	if (vacrelstats->pstate != NULL && vacrelstats->nindexes != 0)
+	{
+		int i;
+		for (i = 0; i < vacrelstats->pstate->nworkers; i++)
+		{
+			LVDeadTuple *dead_tuples = &(vacrelstats->dead_tuples[i]);
+			dead_tuples->n_dt = 0;
+		}
+	}
+	else
+		MyDeadTuple->n_dt = 0;
+}
+
+/*
  *	lazy_tid_reaped() -- is a particular tid deletable?
  *
  *		This has the right signature to be an IndexBulkDeleteCallback.
@@ -2005,14 +2301,33 @@ lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
 	LVRelStats *vacrelstats = (LVRelStats *) state;
 	ItemPointer res;
+	int i;
+	int num = (vacrelstats->pstate == NULL) ? 1 : vacrelstats->pstate->nworkers;
+
+	/*
+	 * In parallel lazy vacuum all dead tuple TID locations are stored into
+	 * dynamic shared memory together and entire dead tuple arrays is not
+	 * ordered. However since each dead tuple array corresponding vacuum
+	 * worker is ordered by TID location we can search 'num' times. Here
+	 * since no write happends vacuum worker access the dead tuple array
+	 * without holding lock.
+	 */
+	for (i = 0; i < num; i++)
+	{
+		ItemPointer dead_tuples = (vacrelstats->dead_tuples[i]).dt_array;
+		int n_tuples = (vacrelstats->dead_tuples[i]).n_dt;
 
-	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
-								sizeof(ItemPointerData),
-								vac_cmp_itemptr);
+		res = (ItemPointer) bsearch((void *) itemptr,
+									(void *) dead_tuples,
+									n_tuples,
+									sizeof(ItemPointerData),
+									vac_cmp_itemptr);
 
-	return (res != NULL);
+		if (res != NULL)
+			return true;
+	}
+
+	return false;
 }
 
 /*
@@ -2156,3 +2471,649 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number we need to scan next, or InvalidBlockNumber if scan
+ * is done.
+ *
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages If we're not in parallel
+ * mode.  Since we're reading sequentially, the OS should be doing readahead
+ * for us, so there's no gain in skipping a page now and then; that's likely
+ * to disable readahead and so be counterproductive. Also, skipping even a
+ * single page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * In not parallel mode, before entering the main loop, establish the
+ * invariant that next_unskippable_block is the next block number >= blkno
+ * that's not we can't skip based on the visibility map, either all-visible
+ * for a regular scan or all-frozen for an aggressive scan.  We set it to
+ * nblocks if there's no such block.  We also set up the skipping_blocks
+ * flag correctly at this stage.
+ *
+ * In parallel mode, vacrelstats->pstate is not NULL. We scan heap pages
+ * using parallel heap scan description. Each worker calls heap_parallelscan_nextpage()
+ * in order to exclusively get  block number we need to scan at next.
+ * If given block is all-visible according to visibility map, we skip to
+ * scan this block immediately unlike not parallel lazy scan.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer.  This is OK.  If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page.  It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them.  But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared.  Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table.  This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page.  This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static BlockNumber
+lazy_scan_heap_get_nextpage(Relation onerel, LVRelStats *vacrelstats,
+							LVScanDesc lvscan, bool *all_visible_according_to_vm,
+							Buffer *vmbuffer, int options, bool aggressive)
+{
+	BlockNumber blkno;
+
+	if (vacrelstats->pstate != NULL)
+	{
+		/*
+		 * In parallel lazy vacuum since it's hard to know how many consecutive
+		 * all-visible pages exits on table we skip to scan the heap page immediately.
+		 * if it is all-visible page.
+		 */
+		while ((blkno = heap_parallelscan_nextpage(lvscan->heapscan)) != InvalidBlockNumber)
+		{
+			*all_visible_according_to_vm = false;
+			vacuum_delay_point();
+
+			/* Consider to skip scan page according visibility map */
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0 &&
+				!FORCE_CHECK_PAGE(blkno))
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, blkno, vmbuffer);
+
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+					{
+						vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+					else if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+						*all_visible_according_to_vm = true;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+					{
+						if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+							vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+				}
+			}
+
+			/* We need to scan current blkno, break */
+			break;
+		}
+	}
+	else
+	{
+		bool skipping_blocks = false;
+
+		/* Initialize lv_nextunskippable_page if needed */
+		if (lvscan->lv_cblock == 0 && (options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+		{
+			while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, lvscan->lv_next_unskippable_block,
+													vmbuffer);
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+						break;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+						break;
+				}
+				vacuum_delay_point();
+				lvscan->lv_next_unskippable_block++;
+			}
+
+			if (lvscan->lv_next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+				skipping_blocks = true;
+			else
+				skipping_blocks = false;
+		}
+
+		/* Decide the block number we need to scan */
+		for (blkno = lvscan->lv_cblock; blkno < lvscan->lv_nblocks; blkno++)
+		{
+			if (blkno == lvscan->lv_next_unskippable_block)
+			{
+				/* Time to advance next_unskippable_block */
+				lvscan->lv_next_unskippable_block++;
+				if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+				{
+					while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+					{
+						uint8		vmstatus;
+
+						vmstatus = visibilitymap_get_status(onerel,
+															lvscan->lv_next_unskippable_block,
+															vmbuffer);
+						if (aggressive)
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+								break;
+						}
+						else
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+								break;
+						}
+						vacuum_delay_point();
+						lvscan->lv_next_unskippable_block++;
+					}
+				}
+
+				/*
+				 * We know we can't skip the current block.  But set up
+				 * skipping_all_visible_blocks to do the right thing at the
+				 * following blocks.
+				 */
+				if (lvscan->lv_next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
+					skipping_blocks = true;
+				else
+					skipping_blocks = false;
+
+				/*
+				 * Normally, the fact that we can't skip this block must mean that
+				 * it's not all-visible.  But in an aggressive vacuum we know only
+				 * that it's not all-frozen, so it might still be all-visible.
+				 */
+				if (aggressive && VM_ALL_VISIBLE(onerel, blkno, vmbuffer))
+					*all_visible_according_to_vm = true;
+
+				/* Found out that next unskippable block number */
+				break;
+			}
+			else
+			{
+				/*
+				 * The current block is potentially skippable; if we've seen a
+				 * long enough run of skippable blocks to justify skipping it, and
+				 * we're not forced to check it, then go ahead and skip.
+				 * Otherwise, the page must be at least all-visible if not
+				 * all-frozen, so we can set all_visible_according_to_vm = true.
+				 */
+				if (skipping_blocks && !FORCE_CHECK_PAGE(blkno))
+				{
+					/*
+					 * Tricky, tricky.  If this is in aggressive vacuum, the page
+					 * must have been all-frozen at the time we checked whether it
+					 * was skippable, but it might not be any more.  We must be
+					 * careful to count it as a skipped all-frozen page in that
+					 * case, or else we'll think we can't update relfrozenxid and
+					 * relminmxid.  If it's not an aggressive vacuum, we don't
+					 * know whether it was all-frozen, so we have to recheck; but
+					 * in this case an approximate answer is OK.
+					 */
+					if (aggressive || VM_ALL_FROZEN(onerel, blkno, vmbuffer))
+						vacrelstats->frozenskipped_pages++;
+					continue;
+				}
+
+				*all_visible_according_to_vm = true;
+
+				/* We need to scan current blkno, break */
+				break;
+			}
+		} /* for */
+
+		/* Advance the current block number for the next scan */
+		lvscan->lv_cblock = blkno + 1;
+	}
+
+	return (blkno == lvscan->lv_nblocks) ? InvalidBlockNumber : blkno;
+}
+
+/*
+ * Begin lazy vacuum scan. lvscan->heapscan is NULL if
+ * we're not in parallel lazy vacuum.
+ */
+static LVScanDesc
+lv_beginscan(LVRelStats *vacrelstats, ParallelHeapScanDesc pscan,
+			 Relation onerel)
+{
+	LVScanDesc lvscan;
+
+	lvscan = (LVScanDesc) palloc(sizeof(LVScanDescData));
+
+	lvscan->lv_cblock = 0;
+	lvscan->lv_next_unskippable_block = 0;
+	lvscan->lv_nblocks = vacrelstats->rel_pages;
+
+	if (pscan != NULL)
+		lvscan->heapscan = heap_beginscan_parallel(onerel, pscan);
+	else
+		lvscan->heapscan = NULL;
+
+	return lvscan;
+}
+
+/*
+ * End lazy vacuum scan.
+ */
+static void
+lv_endscan(LVScanDesc lvscan)
+{
+	if (lvscan->heapscan != NULL)
+		heap_endscan(lvscan->heapscan);
+	pfree(lvscan);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Lazy Vacuum Support
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * Estimate storage for parallel lazy vacuum.
+ */
+static void
+lazy_estimate_dsm(ParallelContext *pcxt, long maxtuples, int nindexes)
+{
+	int size = 0;
+	int keys = 0;
+
+	/* Estimate size for parallel heap scan */
+	size += heap_parallelscan_estimate(SnapshotAny);
+	keys++;
+
+	/* Estimate size for vacuum statistics */
+	size += BUFFERALIGN(sizeof(LVRelStats) * pcxt->nworkers);
+	keys++;
+
+	/* Estimate size fo index vacuum statistics */
+	size += BUFFERALIGN(sizeof(LVIndStats) * nindexes);
+	keys++;
+
+	/* Estimate size for dead tuple arrays */
+	size += BUFFERALIGN((sizeof(LVDeadTuple) + sizeof(ItemPointerData) * maxtuples) * pcxt->nworkers);
+	keys++;
+
+	/* Estimate size for parallel lazy vacuum state */
+	size += BUFFERALIGN(sizeof(LVParallelState) + sizeof(VacWorker) * pcxt->nworkers);
+	keys++;
+
+	/* Estimate size for vacuum task */
+	size += BUFFERALIGN(sizeof(VacuumTask));
+	keys++;
+
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, keys);
+}
+
+/*
+ * Initialize dynamic shared memory for parallel lazy vacuum. We store
+ * relevant informations of parallel heap scanning, dead tuple array
+ * and vacuum statistics for each worker and some parameters for
+ * lazy vacuum.
+ */
+static void
+lazy_initialize_dsm(ParallelContext *pcxt, Relation onerel,
+					LVRelStats *vacrelstats, int options,
+					bool aggressive)
+{
+	ParallelHeapScanDesc pscan;
+	LVRelStats *lvrelstats;
+	LVIndStats *lvindstats;
+	LVDeadTuple *dead_tuples;
+	LVParallelState *pstate;
+	VacuumTask	*vacuum_task;
+	int i;
+	int dead_tuples_size;
+	int pstate_size;
+
+	/* Allocate and initialize DSM for parallel scan description */
+	pscan = (ParallelHeapScanDesc) shm_toc_allocate(pcxt->toc,
+													heap_parallelscan_estimate(SnapshotAny));
+
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_SCAN, pscan);
+	heap_parallelscan_initialize(pscan, onerel, SnapshotAny);
+
+	/* Allocate and initialize DSM for vacuum stats for each worker */
+	lvrelstats = (LVRelStats *)shm_toc_allocate(pcxt->toc,
+											 sizeof(LVRelStats) * pcxt->nworkers);
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_STATS, lvrelstats);
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		LVRelStats *stats = lvrelstats + sizeof(LVRelStats) * i;
+
+		memcpy(stats, vacrelstats, sizeof(LVRelStats));
+	}
+
+	/* Allocate and initialize DSM for dead tuple array */
+	dead_tuples_size = sizeof(LVDeadTuple) * pcxt->nworkers;
+	dead_tuples_size += sizeof(ItemPointerData) * vacrelstats->max_dead_tuples * pcxt->nworkers;
+	dead_tuples = (LVDeadTuple *) shm_toc_allocate(pcxt->toc, dead_tuples_size);
+	vacrelstats->dead_tuples = dead_tuples;
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLES, dead_tuples);
+
+	/* Allocate DSM for index vacuum statistics */
+	lvindstats = (LVIndStats *) shm_toc_allocate(pcxt->toc,
+												 sizeof(LVIndStats) * vacrelstats->nindexes);
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_INDEX_STATS, lvindstats);
+
+
+	/* Allocate and initialize DSM for parallel state */
+	pstate_size = sizeof(LVParallelState) + sizeof(VacWorker) * pcxt->nworkers;
+	pstate = (LVParallelState *) shm_toc_allocate(pcxt->toc, pstate_size);
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_STATE, pstate);
+	pstate->nworkers = pcxt->nworkers;
+	ConditionVariableInit(&(pstate->cv));
+	SpinLockInit(&(pstate->mutex));
+
+	/* Allocate and initialize DSM for vacuum task */
+	vacuum_task = (VacuumTask *) shm_toc_allocate(pcxt->toc, sizeof(VacuumTask));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_TASK, vacuum_task);
+	vacuum_task->aggressive = aggressive;
+	vacuum_task->options = options;
+	vacuum_task->oldestxmin = OldestXmin;
+	vacuum_task->freezelimit = FreezeLimit;
+	vacuum_task->multixactcutoff = MultiXactCutoff;
+	vacuum_task->elevel = elevel;
+}
+
+/*
+ * Initialize parallel lazy vacuum for worker.
+ */
+static void
+lazy_initialize_worker(shm_toc *toc, ParallelHeapScanDesc *pscan,
+						   LVRelStats **vacrelstats, int *options,
+						   bool *aggressive)
+{
+	LVRelStats *lvstats;
+	LVIndStats *vacindstats;
+	VacuumTask	*vacuum_task;
+	LVDeadTuple *dead_tuples;
+	LVParallelState *pstate;
+
+	/* Set up parallel heap scan description */
+	*pscan = (ParallelHeapScanDesc) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_SCAN);
+
+	/* Set up vacuum stats */
+	lvstats = (LVRelStats *) shm_toc_lookup(toc, VACUUM_KEY_VACUUM_STATS);
+	*vacrelstats = lvstats + sizeof(LVRelStats) * ParallelWorkerNumber;
+
+	/* Set up vacuum index statistics */
+	vacindstats = (LVIndStats *) shm_toc_lookup(toc, VACUUM_KEY_INDEX_STATS);
+	(*vacrelstats)->vacindstats = (LVIndStats *)vacindstats;
+
+	/* Set up dead tuple list */
+	dead_tuples = (LVDeadTuple *) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLES);
+	(*vacrelstats)->dead_tuples = dead_tuples;
+
+	/* Set up vacuum task */
+	vacuum_task = (VacuumTask *) shm_toc_lookup(toc, VACUUM_KEY_VACUUM_TASK);
+
+	/* Set up parallel vacuum state */
+	pstate = (LVParallelState *) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_STATE);
+	(*vacrelstats)->pstate = pstate;
+	MyVacWorker = &(pstate->vacworkers[ParallelWorkerNumber]);
+	MyVacWorker->state = VACSTATE_STARTUP;
+
+	/* Set up parameters for lazy vacuum */
+	OldestXmin = vacuum_task->oldestxmin;
+	FreezeLimit = vacuum_task->freezelimit;
+	MultiXactCutoff = vacuum_task->multixactcutoff;
+	elevel = vacuum_task->elevel;
+	*options = vacuum_task->options;
+	*aggressive = vacuum_task->aggressive;
+}
+
+/*
+ * Set my vacuum state exclusively and wait until all vacuum workers
+ * finish vacuum.
+ */
+static void
+lazy_set_vacstate_and_wait_finished(LVRelStats *vacrelstats)
+{
+	LVParallelState *pstate = vacrelstats->pstate;
+	uint32 round;
+	int n_count, n_comp;
+
+	/* Exit if in not parallel vacuum */
+	if (pstate == NULL)
+		return;
+
+	SpinLockAcquire(&(pstate->mutex));
+
+	/* Change my vacstate */
+	round = MyVacWorker->round;
+	MyVacWorker->state = VACSTATE_VACUUM_FINISHED;
+
+	/* Check all vacuum worker states */
+	n_count = lazy_count_vacstate_finished(pstate, round, &n_comp);
+
+	/*
+	 * If I'm a last running worker that has reached to here then clear
+	 * dead tuple. Note that clearing dead tuple array must be done
+	 * by only one worker and during holding lock.
+	 */
+	if ((n_count + n_comp) == pstate->nworkers)
+		lazy_clear_dead_tuple(vacrelstats);
+
+	SpinLockRelease(&(pstate->mutex));
+
+	ConditionVariablePrepareToSleep(&(pstate->cv));
+
+	/* Sleep until all of vacuum workers reached here */
+	while (!lazy_check_vacstate_finished(pstate, round))
+		ConditionVariableSleep(&(pstate->cv), WAIT_EVENT_PARALLEL_FINISH);
+
+	ConditionVariableCancelSleep();
+
+	/* For next round scan, change its state and increment round number */
+	lazy_set_my_vacstate(pstate, VACSTATE_SCANNING, true, false);
+}
+
+/*
+ * Set my vacuum state exclusively and wait until all vacuum workers
+ * prepared vacuum.
+ */
+static void
+lazy_set_vacstate_and_wait_prepared(LVParallelState *pstate)
+{
+	uint32 round;
+
+	/* Exit if in not parallel vacuum */
+	if (pstate == NULL)
+		return;
+
+	/* update my vacstate */
+	round = lazy_set_my_vacstate(pstate, VACSTATE_VACUUM_PREPARED, false, true);
+
+	ConditionVariablePrepareToSleep(&(pstate->cv));
+
+	/* Sleep until all of vacuum workers reached here */
+	while (!lazy_check_vacstate_prepared(pstate, round))
+		ConditionVariableSleep(&(pstate->cv), WAIT_EVENT_PARALLEL_FINISH);
+
+	ConditionVariableCancelSleep();
+
+	/* For next round scan, change its state */
+	lazy_set_my_vacstate(pstate, VACSTATE_VACUUMING, false, false);
+}
+
+/*
+ * Set my vacstate. After set state we increment its round and notice other
+ * waiting process if required. Return current its round number.
+ */
+static uint32
+lazy_set_my_vacstate(LVParallelState *pstate, uint8 state, bool nextloop,
+					 bool broadcast)
+{
+	uint32 round;
+
+	/* Quick exit if in not parallel vacuum */
+	if (pstate == NULL)
+		return 0;
+
+	Assert(IsParallelWorker());
+
+	SpinLockAcquire(&(pstate->mutex));
+
+	MyVacWorker->state = state;
+	round = MyVacWorker->round;
+
+	/* Increment its round number */
+	if (nextloop)
+		(MyVacWorker->round)++;
+
+	SpinLockRelease(&(pstate->mutex));
+
+	/* Notice other waiting vacuum worker */
+	if (broadcast)
+		ConditionVariableBroadcast(&(pstate->cv));
+
+	return round;
+}
+
+/*
+ * Check if all vacuum workers have finished to scan heap and prepared to
+ * reclaim dead tuple. Return true if all vacuum workers have prepared.
+ * Otherwise return false.
+ */
+static bool
+lazy_check_vacstate_prepared(LVParallelState *pstate, uint32 round)
+{
+	int n_count = 0;
+	int n_comp = 0;
+	int i;
+
+	SpinLockAcquire(&(pstate->mutex));
+
+	/*
+	 * Count vacuum workers who are in coutable_state on same round and
+	 * who are in VACSTATE_COMPLETE state.
+	 */
+	for (i = 0; i < pstate->nworkers; i++)
+	{
+		VacWorker *vacworker = &(pstate->vacworkers[i]);
+		uint32 w_round = vacworker->round;
+
+		if ((vacworker->state & VACPHASE_VACUUM) != 0 && w_round ==  round)
+			n_count++;
+		else if (vacworker->state == VACSTATE_COMPLETE)
+			n_comp++;
+	}
+
+	SpinLockRelease(&(pstate->mutex));
+
+	return (n_count + n_comp) == pstate->nworkers;
+}
+
+/*
+ * Check if all vacuum workers have finished vacuum on table and index.
+ * Return true if all vacuum workers have finished. Otherwise return false.
+ */
+static bool
+lazy_check_vacstate_finished(LVParallelState *pstate, uint32 round)
+{
+	int n_count, n_comp;
+
+	SpinLockAcquire(&(pstate->mutex));
+	n_count = lazy_count_vacstate_finished(pstate, round, &n_comp);
+	SpinLockRelease(&(pstate->mutex));
+
+	return (n_count + n_comp) == pstate->nworkers;
+}
+
+/*
+ * When counting the number of vacuum worker who has finished to vacuum
+ * on table and index, some vacuum workers could proceed to subsequent
+ * state on next round. We count the number of vacuum worker who is in the same
+ * state or is in subsequent state on next round. Caller must hold mutex lock.
+ */
+static int
+lazy_count_vacstate_finished(LVParallelState *pstate, uint32 round, int *n_complete)
+{
+	int n_count = 0;
+	int n_comp = 0;
+	int i;
+
+	for (i = 0; i < pstate->nworkers; i++)
+	{
+		VacWorker *vacworker = &(pstate->vacworkers[i]);
+		uint32 w_round = vacworker->round;
+
+		if (((vacworker->state & VACSTATE_VACUUM_FINISHED) != 0 && w_round == round) ||
+			((vacworker->state & VACPHASE_SCAN) != 0 && w_round == (round + 1)))
+			n_count++;
+		else if (vacworker->state == VACSTATE_COMPLETE)
+			n_comp++;
+	}
+
+	*n_complete = n_comp;
+
+	return n_count;
+}
+
+/*
+ * Return the number of maximum dead tuples can be stored according
+ * to vac_work_mem.
+ */
+static long
+lazy_get_max_dead_tuple(LVRelStats *vacrelstats)
+{
+	long maxtuples;
+	int	vac_work_mem = IsAutoVacuumWorkerProcess() &&
+		autovacuum_work_mem != -1 ?
+		autovacuum_work_mem : maintenance_work_mem;
+
+	if (vacrelstats->nindexes != 0)
+	{
+		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+		maxtuples = Min(maxtuples, INT_MAX);
+		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+
+		/* curious coding here to ensure the multiplication can't overflow */
+		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > vacrelstats->old_rel_pages)
+			maxtuples = vacrelstats->old_rel_pages * LAZY_ALLOC_TUPLES;
+
+		/* stay sane if small maintenance_work_mem */
+		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+	}
+	else
+	{
+		maxtuples = MaxHeapTuplesPerPage;
+	}
+
+	return maxtuples;
+}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index a27e5ed..c3bf0d9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1595,7 +1595,12 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
 static bool
 _equalVacuumStmt(const VacuumStmt *a, const VacuumStmt *b)
 {
-	COMPARE_SCALAR_FIELD(options);
+	if (a->options.flags != b->options.flags)
+		return false;
+
+	if (a->options.nworkers != b->options.nworkers)
+		return false;
+
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_NODE_FIELD(va_cols);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9eef550..dcf0353 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -178,6 +178,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 			   bool *deferrable, bool *initdeferred, bool *not_valid,
 			   bool *no_inherit, core_yyscan_t yyscanner);
 static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+static VacuumOptions *makeVacOpt(VacuumOption opt, int nworkers);
 
 %}
 
@@ -228,6 +229,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	struct ImportQual	*importqual;
 	InsertStmt			*istmt;
 	VariableSetStmt		*vsetstmt;
+	VacuumOptions		*vacopts;
 	PartitionElem		*partelem;
 	PartitionSpec		*partspec;
 	PartitionRangeDatum	*partrange_datum;
@@ -292,7 +294,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_extension_opt_item alter_extension_opt_item
 
 %type <ival>	opt_lock lock_type cast_context
-%type <ival>	vacuum_option_list vacuum_option_elem
+%type <vacopts>	vacuum_option_list vacuum_option_elem
 %type <boolean>	opt_or_replace
 				opt_grant_grant_option opt_grant_admin_option
 				opt_nowait opt_if_exists opt_with_data
@@ -9720,47 +9722,59 @@ cluster_index_specification:
 VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 1;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose qualified_name
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 1;
 					n->relation = $5;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose AnalyzeStmt
 				{
 					VacuumStmt *n = (VacuumStmt *) $5;
-					n->options |= VACOPT_VACUUM;
+					n->options.flags |= VACOPT_VACUUM;
 					if ($2)
-						n->options |= VACOPT_FULL;
+						n->options.flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						n->options.flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						n->options.flags |= VACOPT_VERBOSE;
+					n->options.nworkers = 1;
 					$$ = (Node *)n;
 				}
 			| VACUUM '(' vacuum_option_list ')'
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *) n;
@@ -9768,29 +9782,52 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 			| VACUUM '(' vacuum_option_list ')' qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = $5;
 					n->va_cols = $6;
 					if (n->va_cols != NIL)	/* implies analyze */
-						n->options |= VACOPT_ANALYZE;
+						n->options.flags |= VACOPT_ANALYZE;
 					$$ = (Node *) n;
 				}
 		;
 
 vacuum_option_list:
 			vacuum_option_elem								{ $$ = $1; }
-			| vacuum_option_list ',' vacuum_option_elem		{ $$ = $1 | $3; }
+			| vacuum_option_list ',' vacuum_option_elem
+			{
+				VacuumOptions *vacopts1 = (VacuumOptions *)$1;
+				VacuumOptions *vacopts2 = (VacuumOptions *)$3;
+
+				vacopts1->flags |= vacopts2->flags;
+				if (vacopts1->nworkers < vacopts2->nworkers)
+					vacopts1->nworkers = vacopts2->nworkers;
+
+				$$ = vacopts1;
+				pfree(vacopts2);
+			}
 		;
 
 vacuum_option_elem:
-			analyze_keyword		{ $$ = VACOPT_ANALYZE; }
-			| VERBOSE			{ $$ = VACOPT_VERBOSE; }
-			| FREEZE			{ $$ = VACOPT_FREEZE; }
-			| FULL				{ $$ = VACOPT_FULL; }
+			analyze_keyword		{ $$ = makeVacOpt(VACOPT_ANALYZE, 1); }
+			| VERBOSE			{ $$ = makeVacOpt(VACOPT_VERBOSE, 1); }
+			| FREEZE			{ $$ = makeVacOpt(VACOPT_FREEZE, 1); }
+			| FULL				{ $$ = makeVacOpt(VACOPT_FULL, 1); }
+			| PARALLEL ICONST
+				{
+					if ($2 < 1)
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+								 errmsg("parallel vacuum degree must be more than 1"),
+								 parser_errposition(@1)));
+					$$ = makeVacOpt(VACOPT_PARALLEL, $2);
+				}
 			| IDENT
 				{
 					if (strcmp($1, "disable_page_skipping") == 0)
-						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+						$$ = makeVacOpt(VACOPT_DISABLE_PAGE_SKIPPING, 1);
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
@@ -9798,27 +9835,36 @@ vacuum_option_elem:
 									 parser_errposition(@1)));
 				}
 		;
-
 AnalyzeStmt:
 			analyze_keyword opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 1;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| analyze_keyword opt_verbose qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 1;
 					n->relation = $3;
 					n->va_cols = $4;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 		;
 
@@ -15284,6 +15330,16 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
 	return (Node *) s;
 }
 
+static VacuumOptions *
+makeVacOpt(VacuumOption opt, int nworkers)
+{
+	VacuumOptions *vacopts = palloc(sizeof(VacuumOptions));
+
+	vacopts->flags = opt;
+	vacopts->nworkers = nworkers;
+	return vacopts;
+}
+
 /* parser_init()
  * Initialize to parse one query string
  */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 251b9fe..5cc683f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -186,7 +186,7 @@ typedef struct av_relation
 typedef struct autovac_table
 {
 	Oid			at_relid;
-	int			at_vacoptions;	/* bitmask of VacuumOption */
+	VacuumOptions at_vacoptions;	/* contains bitmask of VacuumOption */
 	VacuumParams at_params;
 	int			at_vacuum_cost_delay;
 	int			at_vacuum_cost_limit;
@@ -2414,7 +2414,7 @@ do_autovacuum(void)
 			 * next table in our list.
 			 */
 			HOLD_INTERRUPTS();
-			if (tab->at_vacoptions & VACOPT_VACUUM)
+			if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 				errcontext("automatic vacuum of table \"%s.%s.%s\"",
 						   tab->at_datname, tab->at_nspname, tab->at_relname);
 			else
@@ -2651,10 +2651,11 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab = palloc(sizeof(autovac_table));
 		tab->at_relid = relid;
 		tab->at_sharedrel = classForm->relisshared;
-		tab->at_vacoptions = VACOPT_SKIPTOAST |
+		tab->at_vacoptions.flags = VACOPT_SKIPTOAST |
 			(dovacuum ? VACOPT_VACUUM : 0) |
 			(doanalyze ? VACOPT_ANALYZE : 0) |
 			(!wraparound ? VACOPT_NOWAIT : 0);
+		tab->at_vacoptions.nworkers = 1;
 		tab->at_params.freeze_min_age = freeze_min_age;
 		tab->at_params.freeze_table_age = freeze_table_age;
 		tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
@@ -2901,10 +2902,10 @@ autovac_report_activity(autovac_table *tab)
 	int			len;
 
 	/* Report the command and possible options */
-	if (tab->at_vacoptions & VACOPT_VACUUM)
+	if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: VACUUM%s",
-				 tab->at_vacoptions & VACOPT_ANALYZE ? " ANALYZE" : "");
+				 tab->at_vacoptions.flags & VACOPT_ANALYZE ? " ANALYZE" : "");
 	else
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: ANALYZE");
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 127dc86..ab435ce 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -654,7 +654,7 @@ standard_ProcessUtility(Node *parsetree,
 				VacuumStmt *stmt = (VacuumStmt *) parsetree;
 
 				/* we choose to allow this during "read only" transactions */
-				PreventCommandDuringRecovery((stmt->options & VACOPT_VACUUM) ?
+				PreventCommandDuringRecovery((stmt->options.flags & VACOPT_VACUUM) ?
 											 "VACUUM" : "ANALYZE");
 				/* forbidden in parallel mode due to CommandIsReadOnly */
 				ExecVacuum(stmt, isTopLevel);
@@ -2394,7 +2394,7 @@ CreateCommandTag(Node *parsetree)
 			break;
 
 		case T_VacuumStmt:
-			if (((VacuumStmt *) parsetree)->options & VACOPT_VACUUM)
+			if (((VacuumStmt *) parsetree)->options.flags & VACOPT_VACUUM)
 				tag = "VACUUM";
 			else
 				tag = "ANALYZE";
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 92afc32..37d6780 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1991,7 +1991,6 @@ EstimateSnapshotSpace(Snapshot snap)
 	Size		size;
 
 	Assert(snap != InvalidSnapshot);
-	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
 	size = add_size(sizeof(SerializedSnapshotData),
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ee7e05a..712f70e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,7 @@ extern Size heap_parallelscan_estimate(Snapshot snapshot);
 extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
 							 Relation relation, Snapshot snapshot);
 extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 541c2fa..7fecbae 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "access/heapam.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -158,7 +159,7 @@ extern int	vacuum_multixact_freeze_table_age;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
-extern void vacuum(int options, RangeVar *relation, Oid relid,
+extern void vacuum(VacuumOptions options, RangeVar *relation, Oid relid,
 	   VacuumParams *params, List *va_cols,
 	   BufferAccessStrategy bstrategy, bool isTopLevel);
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
@@ -189,7 +190,7 @@ extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
 
 /* in commands/vacuumlazy.c */
-extern void lazy_vacuum_rel(Relation onerel, int options,
+extern void lazy_vacuum_rel(Relation onerel, VacuumOptions options,
 				VacuumParams *params, BufferAccessStrategy bstrategy);
 
 /* in commands/analyze.c */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 7ceaa22..d19dad7 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2936,13 +2936,20 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_NOWAIT = 1 << 5,		/* don't wait to get lock (autovacuum only) */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7		/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,		/* don't skip any pages */
+	VACOPT_PARALLEL = 1 << 8	/* do VACUUM parallelly */
 } VacuumOption;
 
+typedef struct VacuumOptions
+{
+	VacuumOption flags; /* OR of VacuumOption flags */
+	int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
+
 typedef struct VacuumStmt
 {
 	NodeTag		type;
-	int			options;		/* OR of VacuumOption flags */
+	VacuumOptions	options;
 	RangeVar   *relation;		/* single table to process, or NULL */
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumStmt;
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 9b604be..bc83323 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -80,5 +80,6 @@ CONTEXT:  SQL function "do_analyze" statement 1
 SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 7b819f6..46355ec 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -61,6 +61,7 @@ VACUUM FULL vaccluster;
 VACUUM FULL vactst;
 
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 
 DROP TABLE vaccluster;
 DROP TABLE vactst;

#23

Claudio Freire

klaussfreire@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#22)

Re: Block level parallel vacuum WIP

On Fri, Jan 6, 2017 at 2:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

table_size | indexes | parallel_degree | time
------------+---------+-----------------+----------
6.5GB | 0 | 1 | 00:00:14
6.5GB | 0 | 2 | 00:00:02
6.5GB | 0 | 4 | 00:00:02

Those numbers look highly suspect.

Are you sure you're not experiencing caching effects? (ie: maybe you
ran the second and third vacuums after the first, and didn't flush the
page cache, so the table was cached)

6.5GB | 2 | 1 | 00:02:18
6.5GB | 2 | 2 | 00:00:38
6.5GB | 2 | 4 | 00:00:46

...

13GB | 0 | 1 | 00:03:52
13GB | 0 | 2 | 00:00:49
13GB | 0 | 4 | 00:00:50

13GB | 2 | 1 | 00:12:42
13GB | 2 | 2 | 00:01:17
13GB | 2 | 4 | 00:02:12

These would also be consistent with caching effects

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#24

Amit Kapila

amit.kapila16@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#22)

Re: Block level parallel vacuum WIP

On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

I got some advices at PGConf.ASIA 2016 and started to work on this again.

The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.

How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#25

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Amit Kapila (#24)

Re: Block level parallel vacuum WIP

On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

I got some advices at PGConf.ASIA 2016 and started to work on this again.

The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.

How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?

Thank you for comment.
I had not considered necessity of dead lock detection support. But
because lazy_scan_heap actquires the relation extension lock and
release it before acquiring another extension lock, I guess we don't
need that changes for parallel lazy vacuum. Thought?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#26

Simon Riggs

simon@2ndquadrant.com

about 9 years ago

In reply to: Masahiko Sawada (#25)

Re: Block level parallel vacuum WIP

On 9 January 2017 at 08:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I had not considered necessity of dead lock detection support.

It seems like a big potential win to scan multiple indexes in parallel.

What do we actually gain from having the other parts of VACUUM execute
in parallel? Does truncation happen faster in parallel? ISTM we might
reduce the complexity of this if there is no substantial gain.

Can you give us some timings for performance of the different phases,
with varying levels of parallelism?

Does the design for collecting dead TIDs use a variable amount of
memory? Does this work negate the other work to allow VACUUM to use >
1GB memory?

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#27

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Claudio Freire (#23)

Re: Block level parallel vacuum WIP

On Sat, Jan 7, 2017 at 7:18 AM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Fri, Jan 6, 2017 at 2:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

table_size | indexes | parallel_degree | time
------------+---------+-----------------+----------
6.5GB | 0 | 1 | 00:00:14
6.5GB | 0 | 2 | 00:00:02
6.5GB | 0 | 4 | 00:00:02

Those numbers look highly suspect.

Are you sure you're not experiencing caching effects? (ie: maybe you
ran the second and third vacuums after the first, and didn't flush the
page cache, so the table was cached)

6.5GB | 2 | 1 | 00:02:18
6.5GB | 2 | 2 | 00:00:38
6.5GB | 2 | 4 | 00:00:46

...

13GB | 0 | 1 | 00:03:52
13GB | 0 | 2 | 00:00:49
13GB | 0 | 4 | 00:00:50

..

13GB | 2 | 1 | 00:12:42
13GB | 2 | 2 | 00:01:17
13GB | 2 | 4 | 00:02:12

These would also be consistent with caching effects

Since I ran vacuum after updated all pages on table, I thought that
all data are in either shared buffer or OS cache. But anyway, I
measured it at only one time so this result is not accurate. I'll test
again and measure it at some times.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#28

Amit Kapila

amit.kapila16@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#25)

Re: Block level parallel vacuum WIP

On Mon, Jan 9, 2017 at 2:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

I got some advices at PGConf.ASIA 2016 and started to work on this again.

The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.

How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?

Thank you for comment.
I had not considered necessity of dead lock detection support. But
because lazy_scan_heap actquires the relation extension lock and
release it before acquiring another extension lock, I guess we don't
need that changes for parallel lazy vacuum. Thought?

Okay, but it is quite possible that lazy_scan_heap is not able to
acquire the required lock as that is already acquired by another
process (which is not part of group performing Vacuum), then all the
processes in a group might need to run deadlock detector code wherein
multiple places, it has been assumed that group members won't
conflict. As an example, refer code in TopoSort where it is trying to
emit all groupmates together and IIRC, the basis of that part of the
code is groupmates won't conflict with each other and this patch will
break that assumption. I have not looked into the parallel vacuum
patch, but changes in 000_make_group_locking_conflict_extend_lock_v2
doesn't appear to be safe. Even if your parallel vacuum patch doesn't
need any change in deadlock detector, then also as proposed it appears
that changes in locking will behave same for any of the operations
performing relation extension. So in future any parallel operation
(say parallel copy/insert) which involves relation extension lock will
behave similary. Is that okay or are you assuming that the next
person developing any such feature should rethink about this problem
and extends your solution to match his requirement.

What do we actually gain from having the other parts of VACUUM execute
in parallel? Does truncation happen faster in parallel?

I think all CPU intensive operations for heap (like checking of
dead/live rows, processing of dead tuples, etc.) can be faster.

Can you give us some timings for performance of the different phases,
with varying levels of parallelism?

I feel timings depend on the kind of test we perform, for example if
there are many dead rows in heap and there are few indexes on a table,
we might see that the gain for doing parallel heap scan is
substantial.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#29

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Amit Kapila (#28)

Re: Block level parallel vacuum WIP

On Tue, Jan 10, 2017 at 3:46 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Jan 9, 2017 at 2:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:

On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.

I got some advices at PGConf.ASIA 2016 and started to work on this again.

The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.

How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?

Thank you for comment.
I had not considered necessity of dead lock detection support. But
because lazy_scan_heap actquires the relation extension lock and
release it before acquiring another extension lock, I guess we don't
need that changes for parallel lazy vacuum. Thought?

Okay, but it is quite possible that lazy_scan_heap is not able to
acquire the required lock as that is already acquired by another
process (which is not part of group performing Vacuum), then all the
processes in a group might need to run deadlock detector code wherein
multiple places, it has been assumed that group members won't
conflict. As an example, refer code in TopoSort where it is trying to
emit all groupmates together and IIRC, the basis of that part of the
code is groupmates won't conflict with each other and this patch will
break that assumption. I have not looked into the parallel vacuum
patch, but changes in 000_make_group_locking_conflict_extend_lock_v2
doesn't appear to be safe. Even if your parallel vacuum patch doesn't
need any change in deadlock detector, then also as proposed it appears
that changes in locking will behave same for any of the operations
performing relation extension. So in future any parallel operation
(say parallel copy/insert) which involves relation extension lock will
behave similary. Is that okay or are you assuming that the next
person developing any such feature should rethink about this problem
and extends your solution to match his requirement.

Thank you for expatiation. I agree that we should support dead lock
detection as well in this patch even if this feature doesn't need that
actually. I'm going to extend 000 patch to support dead lock
detection.

What do we actually gain from having the other parts of VACUUM execute
in parallel? Does truncation happen faster in parallel?

I think all CPU intensive operations for heap (like checking of
dead/live rows, processing of dead tuples, etc.) can be faster.

Vacuum on table with no index can be faster as well.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#30

Masahiko Sawada

sawada.mshk@gmail.com

about 9 years ago

In reply to: Simon Riggs (#26)

2 attachment(s)

Re: Block level parallel vacuum WIP

On Mon, Jan 9, 2017 at 6:01 PM, Simon Riggs <simon@2ndquadrant.com> wrote:

On 9 January 2017 at 08:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

I had not considered necessity of dead lock detection support.

It seems like a big potential win to scan multiple indexes in parallel.

Does the design for collecting dead TIDs use a variable amount of
memory?

No. Collecting dead TIDs and calculation for max dead tuples are same
as current lazy vacuum. That is, the memory space for dead TIDs is
allocated with fixed size. If parallel lazy vacuum that memory space
is allocated in dynamic shared memory, else is allocated in local
memory.

Does this work negate the other work to allow VACUUM to use >
1GB memory?

Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.

Attached result of performance test with scale factor = 500 and the
test script I used. I measured each test at four times and plot
average of last three execution times to sf_500.png file. When table
has index, vacuum execution time is smallest when number of index and
parallel degree is same.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

parallel_vacuum.shapplication/x-sh; name=parallel_vacuum.shDownload

sf_500.pngimage/png; name=sf_500.pngDownload

�PNG


IHDR��,�)PLTE���������������@��  ��� �@����@����`��`��`��@��0`��`@@@@���`�``�`���`���@��`��`�`�����` �```    @@ @�`� `�``����@ � ����������  ���`����`�����`�@�@@�����`���������������������``������������� ����  � �� �  �@ �@��`��`�������@��@��`��p�����������T&�siIDATx������*�\�����>��
��-T��$�����4�ii��v���r��7���r1I����*�$����	�N��u"�p(�m��������A�r���Z+|��s�m�V�������o�t[{��5v���d���0(�;M�)~<����8t?(�H�iD�~m���p��x,�e:E����������ag/oSx�:����4�b+����7�s���v��0/�?�Z�������{��<w��73`�����T�-����vNy���v���"`?�Nw)~<�����o�����������	����g�M�7Y���N!����a�Qp�(4��=y��	������K��`����hG��0�g
������S�@@ch,F�����J�Bg1B�\��xA*Bi1Bw�VD���;G�;v;��E�����" L�[�pu�1�I#��;
F@S$[����(��r�gB�n�����1�b����t����O>������Cv#�����zG����k�=�[�p�e��9��W,%z�31��#4S!}s}e�h�K�"��-F��{m��2�4}�V�c�^4C�����������j�,�#\�yS� ��h�t�����������%��
�-FX�y$�(�
4F����V��/@���1r��4������
���r�
B2�N�d��}1�K���#��h�-F�. � �b���;v;��E��D@pH����_�`F@4A��i@@PA#s�
�)y�@8AUT�$ �UTA@P%��S���d�1|��i�
U^8��q�"h�y+`�Y����K[�Z�`d������np�����k����#!����.x�����@8�����A����]��A@�&i"z+ ��ls�
����a���! H�kc ���b H� ��  ��|I�k ���b �IQ!.@@P%�e����9y�@����tB ����J���	�32���%Oq"B ����J��l\�	~r���!W�JB xA@P%���i/��@p�& !|  �"$��6����@w���M��*�:}��+}0l)��6������oD�r���.�G2�" ���iV�%���#���@����Jf'T�@X�- !6  ��]��@�1f�HUD@�PaBC@B �  ��"�?F��@��3�(	��! �	H�-	�0��������R�ZQ���A@PFS@�`�\�h!��g����*`�� ����>��u#W#��/"A�F@P�N�z�����a�*`���0�H��0���&�4VO��Q[�o�������B`����u.`k+�=i7�'���"k&\����7&V�1�fbfB���X=��3!s�,��4!�vb"`�4!�v�V�$8D����f�#��jQ�X7�iB`�h�a�r��0X7��a�fbV��9\�Ec`��,F��D# ����������X5!X3�0�W�``�X�X12�/�t��af��"W��Au����D�|o���*��)��{Gh��X%r%z��[�w�X-�>���^���{/ ����O�0��z1��&��*����0" !�V�	��U�^f+�������1�",\2B��b`E����A�:��
3�
���`%
C�����&���s@��%�/������	I�����.x+ V�*v�0X%	s��  �Z&��s����e+ VA'���Q(����)R�u7o�!�BR�v��q�!`�$K�t����o/ V@h}����6�?B`�	��5T����!VG\�������GN�f[���9`m�^@,�.x-��.F��|� �&"�|+�j_����Q��q0��Lf���D����D�u.�{9�``������a��5����&�����,�b`�$�&�W|@@,���a�f�h�_�^@,��%���&�9��%Gi���qX0�����R�e�sy�_`�,E��
��B F��l/�#4X4�*#�6~��,�<��X0�*��ps��e��x"�8F���U��M�IG*�8�UH}�����>)���O�
���ej�ztc������V��fx�E2�����`<����y�<���0�
8���y�"��7��U���~Y�F������,C��������C�����
L+��'����f2p��,�W��������s=�q��S#z|���eg�0+rR���o:�@=g"'^��!2�������N��8Q
'M��U�gB�������8<�M�"�;2]��h�Mg�qJ4���a0��������ZB������K�L���y|M+u�X���������c()�6����E���H}c�m�h��0�@�A*��`�4L��Uq�g� D(
3P��Yz�"�GS�����4L+8od M�9�T�A@x0
��D_n�������<
�I�aBD���U���@��0[0����b�&��+"`�D({
1�=�b �v����&�@,�4L��^�-j���1�z�l��n�GN���Gn
D���
���k'T���S��kGH��>�]��X1r%z��w%z����e#%��.+G.�P@�Q��0
V��T����X4����1  !�fT� ���X2K�����zA@P����"`�|A@,#kA�����ca���2�d<���H91�L���6�od�N��~�$��E@�b8��?K�&�K����\���h`��?I1�@��v�K��3<��Z�g�1��G�]�[����>7�M�7o~�����(�!�_���������p�����c�12����Z���n<�S��m��d����~����� 7!�5�yr���(�9����~y��{`�'�&���Vz��Oq��(�m�n���8�=p�6�/��y���u�`�J��8�w���t�2o�p����v����B����}f��~}�=����<�?���W���!���A��#W}0
f��?��y�d����G5�����s@,LF�x��V{]�W�~��>����
y�
B�Y���}
_rb����s@�4L�����_��h�t����ra������G�a�2�7����3B�Xy��1)�����KH�T����������c��M��G�h�����lG*����*
�7cT@�b�Z4In���j)*�-�0k��U�](�|?�g������a!�+���0��������y3u8a�����1�f�
���S�����7h[X�y3�T1p@�����$�f��L]r�x�7cY@]G�Y��5o��T%d�����-K7o���V9���z���#B������'��S���N�-�=GvF��m�yw �S�������W�y3��l�����[x�������vk��c]@��5���o<���5�|�/��y���zG�����0[��~���7Na^�c�0~,����Kz��_2Y���YY��%�T�/�{���������>�%z��%z9���:����N~�A�r�@@��{*���s���6�b����b�p���4�@�cg��>�#�~"z ��}j����c�������v������:F��a�y>!���
�b��������8��(����&J�+6��+���gl��H># N��+���P>" NJ����X>! N�����`�  �*�  ����
�*�  ����
�*�  �����\}��6Mq"(�"�M��lP.B���G�v��jDC����n��������Q  ����P,$�AU�X��H;���Pi# �
��`� \���O�A����`B!lPI�`!,����D4��z��`a�p	+�����?����#�"`��p�'!�M������#���4�s�1.��$��clz�L��H��������'|��\���"�S�-`�[����O�B����S|
�P���	3��"�\z�_ �QP�;'���	/�P�9��;j��	�\��0��h�����L��Ds(CM�9i�%z���-�p�h����2����t�������������"m����mM�'��!��TKks��U��D
O���5��G�1rm�7�t���'������%���hc�&/����I�)�Yz��wl����&�`��E�M!���R|n�]�xsr���F���56�(����,�+�@0���v��9'�zjz~i���/��3P��8��	�a1h�dK@k���������}*,�o� 	�8}����n��}r}����;?���������=�����k�r�7�/���ps�9���[<�[;�}�'�	�{�DB8������8����3���C!���B�]���OU�+IEND�B`�

#31

Claudio Freire

klaussfreire@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#30)

Re: Block level parallel vacuum WIP

On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Attached result of performance test with scale factor = 500 and the
test script I used. I measured each test at four times and plot
average of last three execution times to sf_500.png file. When table
has index, vacuum execution time is smallest when number of index and
parallel degree is same.

It does seem from those results that parallel heap scans aren't paying
off, and in fact are hurting.

It could be your I/O that's at odds with the parallel degree settings
rather than the approach (ie: your I/O system can't handle that many
parallel scans), but in any case it does warrant a few more tests.

I'd suggest you try to:

1. Disable parallel lazy vacuum, leave parallel index scans
2. Limit parallel degree to number of indexes, leaving parallel lazy
vacuum enabled
3. Cap lazy vacuum parallel degree by effective_io_concurrency, and
index scan parallel degree to number of indexes

And compare against your earlier test results.

I suspect 1 could be the winner, but 3 has a chance too (if e_i_c is
properly set up for your I/O system).

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#32

Claudio Freire

klaussfreire@gmail.com

about 9 years ago

In reply to: Masahiko Sawada (#30)

Re: Block level parallel vacuum WIP

On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Does this work negate the other work to allow VACUUM to use >
1GB memory?

Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.

I think it would work well with DSA as well.

Just instead of having a single segment list, you'd have one per worker.

Since workers work on disjoint tid sets, that shouldn't pose a problem.

The segment list can be joined together later rather efficiently
(simple logical joining of the segment pointer arrays) for the index
scan phases.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#33

David Steele

david@pgmasters.net

almost 9 years ago

In reply to: Claudio Freire (#32)

Re: Block level parallel vacuum WIP

On 1/10/17 11:23 AM, Claudio Freire wrote:

On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Does this work negate the other work to allow VACUUM to use >
1GB memory?

Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.

I think it would work well with DSA as well.

Just instead of having a single segment list, you'd have one per worker.

Since workers work on disjoint tid sets, that shouldn't pose a problem.

The segment list can be joined together later rather efficiently
(simple logical joining of the segment pointer arrays) for the index
scan phases.

It's been a while since there was any movement on this patch and quite a
few issues have been raised.

Have you tried the approaches that Claudio suggested?

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#34

Masahiko Sawada

sawada.mshk@gmail.com

almost 9 years ago

In reply to: David Steele (#33)

Re: Block level parallel vacuum WIP

On Fri, Mar 3, 2017 at 11:01 PM, David Steele <david@pgmasters.net> wrote:

On 1/10/17 11:23 AM, Claudio Freire wrote:

On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Does this work negate the other work to allow VACUUM to use >
1GB memory?

Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.

I think it would work well with DSA as well.

Just instead of having a single segment list, you'd have one per worker.

Since workers work on disjoint tid sets, that shouldn't pose a problem.

The segment list can be joined together later rather efficiently
(simple logical joining of the segment pointer arrays) for the index
scan phases.

It's been a while since there was any movement on this patch and quite a
few issues have been raised.

Have you tried the approaches that Claudio suggested?

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#35

Robert Haas

robertmhaas@gmail.com

almost 9 years ago

In reply to: Masahiko Sawada (#34)

Re: Block level parallel vacuum WIP

On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#36

Masahiko Sawada

sawada.mshk@gmail.com

almost 9 years ago

In reply to: Robert Haas (#35)

Re: Block level parallel vacuum WIP

On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#37

David Steele

david@pgmasters.net

almost 9 years ago

In reply to: Masahiko Sawada (#36)

Re: Block level parallel vacuum WIP

On 3/4/17 9:08 PM, Masahiko Sawada wrote:

On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.

I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.

--
-David
david@pgmasters.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#38

Masahiko Sawada

sawada.mshk@gmail.com

almost 9 years ago

In reply to: David Steele (#37)

Re: Block level parallel vacuum WIP

On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote:

On 3/4/17 9:08 PM, Masahiko Sawada wrote:

On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.

I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#39

Masahiko Sawada

sawada.mshk@gmail.com

over 8 years ago

In reply to: Masahiko Sawada (#38)

3 attachment(s)

Re: Block level parallel vacuum WIP

On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote:

On 3/4/17 9:08 PM, Masahiko Sawada wrote:

On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.

I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.

Thank you!

I re-considered the basic design of parallel lazy vacuum. I didn't
change the basic concept of this feature and usage, the lazy vacuum
still executes with some parallel workers. In current design, dead
tuple TIDs are shared with all vacuum workers including leader process
when table has index. If we share dead tuple TIDs, we have to make two
synchronization points: before starting vacuum and before clearing
dead tuple TIDs. Before starting vacuum we have to make sure that the
dead tuple TIDs are not added no more. And before clearing dead tuple
TIDs we have to make sure that it's used no more.

For index vacuum, each indexes is assigned to a vacuum workers based
on ParallelWorkerNumber. For example, if a table has 5 indexes and
vacuum with 2 workers, the leader process and one vacuum worker are
assigned to 2 indexes, and another vacuum process is assigned the
remaining one. The following steps are how the parallel vacuum
processes if table has indexes.

1. The leader process and workers scan the table in parallel using
ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory.
2. Before vacuum on table, the leader process sort the dead tuple TIDs
in physical order once all workers completes to scan the table.
3. In vacuum on table, the leader process and workers reclaim garbage
on table in block-level parallel.
4. In vacuum on indexes, the indexes on table is assigned to
particular parallel worker or leader process. The process assigned to
a index vacuums on the index.
5. Before back to scanning the table, the leader process clears the
dead tuple TIDs once all workers completes to vacuum on table and
indexes.

Attached the latest patch but it's still PoC version patch and
contains some debug codes. Note that this patch still requires another
patch which moves the relation extension lock out of heavy-weight
lock[1]/messages/by-id/CAD21AoAmdW7eWKi28FkXXd_4fjSdzVDpeH1xYVX7qx=SyqYyJA@mail.gmail.com. The parallel lazy vacuum patch could work even without [1]/messages/by-id/CAD21AoAmdW7eWKi28FkXXd_4fjSdzVDpeH1xYVX7qx=SyqYyJA@mail.gmail.com
patch but could fail during vacuum in some cases.

Also, I attached the result of performance evaluation. The table size
is approximately 300MB ( > shared_buffers) and I deleted tuples on
every blocks before execute vacuum so that vacuum visits every blocks.
The server spec is
* Intel Xeon E5620 @ 2.4Ghz (8cores)
* 32GB RAM
* ioDrive

According to the result of table with indexes, performance of lazy
vacuum improved up to a point where the number of indexes and parallel
degree are the same. If a table has 16 indexes and vacuum with 16
workers, parallel vacuum is 10x faster than single process execution.
Also according to the result of table with no indexes, the parallel
vacuum is 5x faster than single process execution at 8 parallel
degree. Of course we can vacuum only for indexes

I'm planning to work on that in PG11, will register it to next CF.
Comment and feedback are very welcome.

[1]: /messages/by-id/CAD21AoAmdW7eWKi28FkXXd_4fjSdzVDpeH1xYVX7qx=SyqYyJA@mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

result_0_to_16.pngimage/png; name=result_0_to_16.pngDownload

�PNG


IHDR��,�DPLTE���������������@��  ��� �@����@����`��`��`��@��0`��`@@@@���`�``�`���`���@��`��`�`�����` �```    @@ @�`� `�``����@ � ����������  ���`����`�����`�@�@@�����`���������������������``������������� ����  � �� �  �@ �@��`��`�������@��@��`��p��������������������___???ccc�����0�IDATx��������F��L����y��G���r)�{���0 �����e@J�������*S?�<E>X�E����(�A���v���i������q�z�y�����?R<��P'm+���*�m(��z�/t������41.������!�&�����1uO�W�x���O���ia!��_Q�'����0���r��(������7���&pR��5dC�i<x����j�4]\��g���3�/ �������~y���f�P�U7n�l��l�g���n���R��dY%������	Dak/����2�����k�g�� ������������������_@@�a�.��}��{m9?[���V��W����rJ�E@`��+�������w�������! �f�l�Ai+�f��%����������Fo��~�)��
MH#`ON�{����3IX�b��� `�bu}I�SV��g�*�X�]
6���AUv�>�����X6Y=�N����u�=�������$|��W�{��v�@@��������#`S��0�(�EXw��S�;�!����j��r�����Nh�dX��H�pG>3������T�_��01�I��;���[�q�00X
�vG��-�8B.�����]N�B@0q=j������tG>b@_��>RUM^T�Vv���
��~j�������Q�W8	���	Dab@_@@�������1/`1.��on��f�t�&���F��,�on��	�!�c��B<t���q�.��pn�^�t��nk/��B�p�k���.�L���.��:��X�����	�EC0�csG�q���_���V?{��!C-�/?��g[�];7�UC@L[
F�=�����z���N�����W���������F�2��rp�]>�yB�`������N�����S��oT������>�Sl(�v~P�o}���<Q�es=>*�D[�dL�vr-��n-B�#`����/iiv�[��'�0q�s�S��(b�~�+D����V�tL��Y�#G�����F
���8�_j��c��[	�!�c�;J��[��'jIu�_(��v�t��;	����@Wt�:�mTl�������u���G�F���}��Tl�����|��W�~�/ ���W��@���W�~a& ������2����|�.E�lZ?A���0?��-���^ko�b��o�����V��j'��M�uv�g���v@�Di�����_oG�w�S	L�:��^�_oG~��
�7
�Y{�dR����g�kN������=�����M��v�G4*5�R��{7�����%�)����pW��R��@�X�06���.�v��D
;a`Z@@��&�!d�U�_��P�e>����.�O��n)�*���?:Ez����@�����>����h`�
x����;��w���'�7����M�$��=��6W�����'%S���S�'�w�SC�m���]� {�#cU�o��g������V��U��=�\!+��'`�_�y��o-cz�H�NeL���	a-c�*Q:�1}b/`�Q���b-c�"G!��p:��/�2�+����G0'����d�j:a�Lk�!`(�
�eL��������R�?���gAv�1�j����V3��X�����X�d���J�NfL����2�����' `:����J�NfL��.�q���i�MW�_�����1��0�(,��!����j�A�@@��<�����.�v��sp2@@��<���	Da)�MW�_���@��A(84d50�(,��!����j�A�@@��<�����.�v�psp"@@��<���	Da)�MW�_���@��A(85d60�(,��!����j�A�@@�KyB�!g�@3=3�9<P4:"�p����VuO6�E�dT
A�D����"�%��7�0�3]+{�j�"DN?�u�Q��������e
0'���E��X�.���:��|z�������S����1��iY�M�J��ntCh1��M�_M��w�����W�k��R���X�s/#M~{
��t����y�����?���~�c.1HM?��eC�o�,{���V���Z��������.e+{�lL�'�����3�]W���	�Z���i�Z�@����b�����������!^'�J�@�,BR������H2��A�[�_���6���	`�D���@��q"�TC �B�]��	Da)�n�D�C ����b�D���`
N�'�
6&����q����@��&2)`QX��?8�U=]od��v�<88c�!�����8c�!0~��g�7� �p����
D�f��Ve���]Y4F��6d00���E���]���:�D���0=uivCL��'��#`��eQ�UQvf7E#L���a:��j��
C@_����nL�	`����@�4�|Yf]Z����L�@�r��/J� ��������Sp��A�p�,3��.-twYf�0��y����j��#m\�I4sp�hX��������9�6y�ES��� �A��U>]W���@R����k��]���/���o��')���k�����}9�������Q�)��[��Wq0Y�8eLW&J?fL��	*��T�+����D�������/�(,��u� ��9c�*Q�)c:�X�_4,6�2�bU�(<'+�W���#������Tei��k��c���c�t2O�<a5�DP1�-D�tc&��\�����Sp�=dL��'�F�tK?,�h7[2�k-B������#���s�tU�ty�t�y���l��� ��9c�*Q�<c:�CP1����%c�*Q�<c��@����)�n�D�C �������D���ts"� �nNt6�'�0�zk������x���kgus"B@��x!:�3|3�nN����� �B�m��H&�h���x���k7�]�S ����b���w
��`L�����������p! �b�MC$NM?O�r��c���'�9X��f��JC0'��{D�v����(y�/87�!�"�0n�Z�E�Y�rpm��*�_���k<��5������-���������1	G������h+[&L�1`Y�eU�F@h��b�����_[�On���
��8/@�Z'�����[��� `�Cg#U�B���c��"������������������b@��a���
���-@�ZF@��a2��v�p�
�]������n�,C��n��N	-��&;
xj��*�_���k��
�]
8�A�"D��a0���0���0��_%����M��3�g����z7LU�����}�����B@Y�T�g�]����.n�c2���
�-��]��{S��A���
7\�����AS�&o��"C�A���Z�����)�p�WJ�;��mY7����z��Ka�9��Gire8�U�m��4������)
K�vk�4��)x`I������c�����x��TW�!���E�� 9��@W@qro9�'e�aE�~��1}L�7���&/����v�6�3�o���*��I�bg���X4����e5�X��-O�vf7)_���j��`
���U�_���� ���$50�(,��C0�M��,B�l
�42l��i��0����w�t7��(,����``h���1c�X�����0�)�����t���L���kg,�d��>�������j�Y8<�P��0`M\b|b@��P������U�Y��@�>����� `�R@m0�D�����Q@�b�,���|N�>�(�Fa)��C���\��c�1��A�"�?�\���9�p0�����������0�! 5�2
�1`&b@�(|�1�
d��\;�U��X���N2=X�0@�
~3`fsa`�0���A( |�1���{��@���� `���!6�`���I@@��N��������~���@���s�tU�t2c�I����&���G\�W1`d��U�����g��K���/��,B2gLW�(T$+<�PL�<�����YZ�eRS�&0�%�T��V%�>8�'�U�!r���,4.��e
�:�+8"��J��c�H�������]Z�y��<��}�_�:�'�1������f"?-atSp�XP�(]�1�����'�AV�?=eLW%JWfL�g�P��"n�,\2����2��7]�
���K��2 2��6�`��Or�~��)+��^�z�4�c�/���Z���``�������}:��#�H�@��������o>���	(�����,���t��>0������h|�!!b@)��>}��W�����P�o����}��%od���_�x��B��{�
���� *���(!b@)�V��7�.;��WB@��!��z}�>K-�_@���)����`��o:v[�00��o�P@�f;	���c�R9�O��/a2���@h1
�\�����H!l����|.�?�@����O�
�����{���s��Yx�w_���VS�������K=)x����������l8#e��7��a�a2<��rA�1��!�q$1�~��YhZ�
�#��O�Pb 3x�.G�/�$<�N������`�K�HK�ln�B������`7�`i���w��*��^	��P�6d;�pB��A��qCw���I�0oX
�1���0�3<\�a�@%����~����P�c�R���@�������u>���KRV�/|���@`f
g�A�3b��}��jX8)�g���
� ����4��
R��A.���'�;K��1)���_'F+`U�ds^�J���!����2	?��ef_���Y�
�q2_���*Y�a��p]+{�����'�"����\|�OU��;�+��X�8��Ww�jf�<�����5j��_�,�[ZM=e�bX�C@�q"��|��w����=�Q!o�W,h��u�99&�����,e������(��E��"�r��R��2�1��r�1 1����v%������*��_i4�T��������E����g�8�H�����8�������	��%���@���)�����N������t����#�/�����*x����"� ��^�?#_�)k�X��	���T��������0�']�V�`�}��Q�Xny��:�j,���q�|ZE��}�����k��
��������A-B��.�)�)�.��B<*���?X�D�k��� v�w6
Nas��4$9"�
�b�4k������/�Q����ip
����!��P�0��!����j�A��9�l�����iHr  L�vh� ����_@5l�;����9�l�����������e	Da)��C�lh�S�_	�Q
��a% LfBA� �
9(�@�r�<(��A�"���`"N�B�T`+�� D�b@��!BAR�x���kg-�PZ��/�8nps31C�x	B�����P�����XHZ�r��<7}�z�W�(��;��U�l��r�%���N-�C��������#����O>��%��Q�#}R���I���3�g��O��=4I��^���n��cs�-��!-���P���9E[����A���;b�'�������)�V@���~�U�E"��y/{��AzxP�����8C�1wH2�bG�G�����+Zi]�e�w+��$��W]^�������R������\zw��\r����^�%�����Tp����c���s���������X�I{�������X�M]����"K�g-��{��*df�_w�:�������..�;�.9���C/��c��o���\p������_
N�_JN�_J�����D�G_�6�\@��B6�]v�j��J* �����+���d/�n�7`�����������+��_J�������s����+����f���������, ��C��15����[��4��y�]D�(��)�\�{r����K�����������M���s����+�������\r���dS ~���(������T�k�V�����������c���z��W�cM��o���F��(��q��B�M���_�h�����I��;�:�� ?���H>B���tDo[��y�1�<�r��
$&�>��sI8�C�(��
�_J���N���=�v�������������(��P{g<��k�C�r��^�m%�l<��aV�����we��dR�c���Cf7N��<;�n�����O��#�s���������,=�3�dy�5��J^AL����g�������S��)����o�*�
8�����������qN���0v���
D����UT��!HW��gC����u}q��%���S�������b��6�z+I�����0u��"D�G��&�?q�@:�4yu���;�^!>e�\����Y�^E���M��%���S�����?�r����S��������\:>��,�� ��WU��3q]Wg��3�D�������mC��#mE�9��HV�P��������?��������{���{�m���S�w���������]��X:���j�>9kNR��x�zd�(�����h������!��Ek����K��_�~��������E!{�����jd��/%���%�`���P������K\���� t�a�{U��w�{�e9��'���5�5y{���}��b�"�	����n�YSn���=���t
���g�Y�z�J��WN\r6����7��pq���Rz8n�
Xd��z�.v��������Gq���*c��
Xg�������v�u^6�K��C*`�;;�+�m>����7�68������  x�T����,�n���1 x�T@q'��v�y�����e���K����K:��<����y@��7!������v`��������9Q��IEND�B`�

result_all.pngimage/png; name=result_all.pngDownload

�PNG


IHDR��,�GPLTE���������������@��  ��� �@����@����`��`��`��@��0`��`@@@@���`�``�`���`���@��`��`�`�����` �```    @@ @�`� `�``����@ � ����������  ���`����`�����`�@�@@�����`���������������������``������������� ����  � �� �  �@ �@��`��`�������@��@��`��p��������������������___???������cccWP�S�IDATx������(F��Z��t�����AQD@P�|NwW[R*$�M�lh���M_��Q�e��l���G���V�``UV��l���T��}?�h�����Y7D[L����o|j��l���Y��U����qk|Afk���L�?-��d���U�]
���p���
.���%`+����c��L����8�T����C<�k�;����&����Gi�OY5�Mf�Ftk�8��>W\�1�(K����Gq}��q&�I}���c�mc���}@���|�%����v�,!�ull���V  �BU�a���}�bzJ%��,;. ����4TA"@��F0�%4Te��}���6�]�qo;=a�<;~c�G�������Kh�B��U��������N��aWU�V��	"P�����b���yr�~��~�>����$Y@i����T�sez_7���_	�2
���+f�,�����e�@��
������
�6�������d0-;�p�W��9@�M������7y�D`�l��'�.m/����lo�_	���@@p+�������AM`�v@@"9��@@p+		�y��6�<���#�!��.ac���q����1��u�AdZI�G�S�����xnD����JBf�@���
xrD�w��|�%�(�����_J���t�_|���*�%���|�����GG��
�Z���b^�"��|�����%l�36_8"��������$$`�
��oD�|�%(��� G"(
�-�M�J���!��/�<��*��ht+��VL���P�w����C�U�(��;�i^)]�b:�����h���3>�`\8_%�����O�<M�-Rh\�F�������OY�%�m�hG����.�Z��iF�q��4�2_L�5��e���v�:�L���n�jx����[�������-����z���l'�  U�����$��7
��w�
��)�?�D��u�T��/��9��+~������L	�0}O���m&����w�e����~l��hEZ1]%�dG'��%�����4�_v�f�����+�X���>e�ZVJ7���*8>��p=_�������^�D�-
��2��n\1�FQ*h�?�h5
�
������b\)��b:�M�/��,�e�������u3����a0���m@����2���O�>�9 `�KP$!A�@@p+�JBf�@C�c�4fn$�Q@�#���ww���KU
�n�����7�.o�/�h��0�%V�/�0\'��1"?%AH�v������/~G������mR|����O+T��q/�0�&�������7��m�vn���F��w�MD ��.!QJ���&����c	2��S�N��7"�]tUp�����!`|�^<"?%3o�a4 `�KP���������$$`�
4���]�"		B�vu)���k���@�K��&�Y�����MN�����7��m��sL>|�%d����6NNaL���~<��
:('�"�����\%���0&+���}~L��+��$$`�
�k���g�/����Xfn��`M	hL~�����]�vL�����+"�L7��>#�<��mL~p��������' �.��`��"�y�t���#���7�0&��S7��Vaa\�p�E"tB����o�]��>G�(�
�����nW7��1�� ��|Ed,�p|��]3���o�+���@��+�/3>��L�;!J�y��N��M����+��Iz@��yEd�����'PG��m�O�������y��yEd�����G `p�mZ>��iL�������h�����G `p�mZ��`H����.��n
���������/���p ��.���Y�G�x����'��A���p��yW�i�_�[��T�������ia`�A�G1����\�y��6`Q���+�4�Y�!`�K�m���hu#�M��E/Upp���U]�o��2$�4�Q@@p+		�y��N1 ��.A���n�[IH��h���|)��)��#�!����.1��f|�9�������<sn�k�7E@��1�j0L1��xD>|�x
QS�C����w���(�"��E7�����V%6�B�5"�x3v��O�u	��U	����Y+�c#�s����������Q76�������%��%���(��w����)Y�P#�s0��%m��t���2�\7"������Q�<oJ�x����9B]@�9�JBf�@��|"@�h��HB��P�F�A@@ ��HH��hhA��l`�v@@"g��� ��x��b-(���q�0�fM�������U��
K��@������%���m�#,�"���,���Z&Up0(f�h�e�,�_/X9��tY(}g��	�3		�y
m�P���������������_����2%�?���i������
�r<���-��s��������������-F0g�8�����y���%F���
���0XEup((f#�  ��Q$$`�
4���]�"		�B1A�]���"'3o��
H�9$`��N�	���P�F�����6�V�9�FVL���
���X7 ������N>eg����
��p�<D��d� `�KP�h/8^/Up4(f#��l�����.F00y��u��d�1 �����'�O�~~�>.`�
4�]�h���0�KP�_�����,���:��*8����_���W<
�Fm��������HK�m�����6`�KP�_��^&?������������}5!6���,����0Q����C@u2��@�Gq�Ypuy'm� A'�T@Z��:!�b6B'�A�!�-��+��wtB���7��t=���CP��h���N��P�F��eZ~�F����Mm}-���r|�����#0S|��n��j����;CR�~�3����G�@��a�������B�'WL��(rh@jS5�7afO/V8�:�<�Q����-��t;�b�<��l�4n����1��yO��n�>�����6`X��]�_�+��?�X1�t�1�-:!d�X��a �X��I<���>����U]����X=k�I<N��>�����X����GqgVL�xg�C@'�679������tE�
��sF�
?�a?���P�F	����� ��S~�nU_9=f���te0*m�h��H��0y���KOP�����!F�\�QP�F���xqa`�0������rIp'���0�-R��$/A�Ss*`x:�"��8B1��0"y~@��C/��N�E�/��36�KP�`'�,���<������� `�{�'���H�������C���b�g"��36�KP�S�vs�h���l/�#�\�>�l�/�sDC�(�!`lv��yAS@��S&�
H1c��E|l�:!V��at�s�828�J� ����UuE'F!/j������K&z����l��
���$T��H�*u�1(f#���@@ ��HH�U&z���l��
H�!+�N�J��|��O���# ��x6��/Vxx�t�@%�M��i6��L��Rj�B���w\�����p����
�lM��l������' ��a��}wf�j�"�����'�6�M��4hh��r_1��~X�~c�t�j?����
�����[���b9��oBU���a`X�% ��,��>�b:�������+���f��)���f�"@�h��i�`T�P�F�A@@<��iC[����Ld^�&���D����WZv@@"4�E����� ����.�c��0�LB�
x6�"��o�h
L�H�*x���ua`@ ���i���;��}��
4��pT@��Ch������F%��l���0p��	a`z$$�K�{�8��%(B_��@��e�^�v@@"t����(;a���6F00c�9��
��H��Q��i�h��p�
.P���R��@��B@�		���l��l��
H�02��A#�:����(_a��B0a`
$$���?m��gIC@n����` I;  ����l���0P�����]>K��c3%��U�,G��d�Q���Nc ����;&������:e<KT?=[	�s���U0*���(��m�0=�k5���ivw�s����@�D���%��\�68��d��6��gt��
���+���_�KrK���MX1��QWL�����L�Nw�F-��X����>e��,�M@X<N@F��R��d�M@�#W��d�Y�C�\�6���d��NH,���~uoS��� `4R��D-|
��������r���]b�����$�2P���������
d���@���m;^OBr�<�lDS@��g4-�� �;�/��:%�
�3p�D�����" ���������1�"T�V��j����s�	�l�����O�=OB���Q<�^��$GB�3���� �h^]7*���R�����^NB������1�lDX��
��[]p0��-�KH
�*�2�u����w� ��m�8x	h����W��0����-�T��]mL1%/`���&�r����K��dW�"!�����i(fl��h�����������]@�s�%!
�m������kc�]Pb�����B��%��.�8sM1Xd�7��.:����0� cA�6�YrPV0���:B_@�J�sYM�H�6F����uHF���@@�1:b���`BR#!�s�Q�H[v�h#��kRp2��W���_z�����IH�c\[3����6���1��� ���������t�BBD������	��� ���1O�����3�����h��'`U~����jH�uc/F�\���,�b�^ B�����r��;����������l%�
^�&\q_���6����b��/�Q,�b0�K$�\���dt��r�fC8��xxW%��C�o�ts	8���_�{��E��=��h�*��_���N/�h%�S@�z3�����U��)7+�,/�����b��~������a3������b�[J@oVE�>���-�guB����^K�Y7�m
*��f�U�'`��q�����hc���Y�P����1��%��9�H�������iC��j���F��v��Nn��l"<�#����x��Y�W3��Fa-^�`��������9F/R�H�����{g!,�j�:I
F�����+�K@B}f�*����%5K�G'�>lI�C�Zk�bT0�
�5�U�����eV1�1�z{0�ZX\2��i^\.�kaA�f&.�l�>Y���<:�u����
u%����f��;�'����n�p�#"���&l,$Y=�����1T6z�K"M��M���.���P��h�s������w��DV�*�2����~�/����{���0�'����=8`�����=z?����@��\F�!��N��/���3��U�����:6�7:^����rH�����9r�[$t��r�(n����=��nf*�[�)n.���������~���K+�������o__v8��
x	�,D��B�w.���]@�����{�E��VG�����O���~�����r(��
��#�@U@5��A��?�#��w!�R��B�X�-f:��6�T!42��i���>)f������G�	�{�J!�:RI+�s��=�����02B��VD�p���C
^���'��^��o�v��mkO�BR��������Q��p��t���.��]�LB�M�����������ix
% 1���{� ��6��\8���
9r�(�.c�V�����i/{:7��1��_������j��������� X�m�����Z�,E�������������������'
f�C~t��M0	�Qk�.q�ld��F�|�	�Jg�+��8��<cA}���_~y��}����T���?&�An��|�,����;��O��@$��4J����S|k�:c	xg��}�C~\P�'T~��U�1����9a	06m�m������J���
��w��9~'�����jZ��N�
���r�^�@Dn�������W�����v'�b�&y	��+�q��r<@�{���^(1�E4
hv@&P�F�
�]��0����&�pzc���0�y�p$Sq��Z��O��FF���]����4�]ER_t��������e�?v����GG_�Gq�{m'����<:��P���������\�f�t@��c�g��p���i�L��sU���������5��3����;n�{�)S�.������P��>��&}>@M���}�����TJ��8��;����4u��*?mS����c�6������*�����^s��o����m����ij���I����g��.�o��:���9@M���}P�>G�����n�>��Tz�����b3�f9F+��� ���G�h���a��/���)S�.�����>�i�� �]��Du>@M�*.r�E�&�"@M�y�W~��1�Q�'8����u���H�}rFE�j�H�&�s�r��w%��?���.�o�:�&}u	9�"`�v5@:��]�Q�0~u��#�>��v�M[7^��j}��pP��C�g{�F�u������U��/~��YY%��8�I_�V9J����Si�4D���_���0���te�����]�=WU�����������chw���	8�v����;�N���]�i�$]X�I�r���j�1)i_���������9&`�����OB_�7u�h=�cm!�k��m����v�N@+�g��;��O�m�.b�I�|��v����]4'7i���%��*X�U�|4�H��-��#'�km�/2�&d(�M��*XM�����	����I���T��.�����}�<�*��"5w4��5�t��E��]���eh��m�M���/�&�S�6��O����kY�d������J���������l
����-�>e�mv�������
n���1S��T���Z�|��}G�m�EA�I�8BM���6�"N����5�r9�;���9��^�0��o�OY����}��<��4�K�	������6�1�?�Y;Hwm���k�)S�.�����I���N��R���M�Wq��i_"��} �}�<��2Lc������=�1]E��jh���"�2<Y3�A���x;H9fI��v�&}�]M��Lr�Wg�4�I_���P�>�i_�i_"��2����#.���W1��u�����D+�WE�6��?�����4�_�����G-{=���nU��o�������
8��6G
5e;�c��l�f)�{`��SU�M+�6
�����������A�v<��{�@a���h�w�GS�����{�]�P�&5��!�6���4�h7����<��+��o\�y�2��s�<q?�f�-	�{�l��+�nG������cuU�(�!�6�

�4}�\z��oc!7N
�p�l���s����A�����6��0���i�9Pj���
�dPe7t{���$�����l���"P����y��&s�����������|�[���i��)���R�v��{� *�6tw�M���df����v����?V�i=��fIEND�B`�

parallel_vacuum_v3.patchapplication/octet-stream; name=parallel_vacuum_v3.patchDownload

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 421c18d..b93231f 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
+VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | PARALLEL <replaceable class="PARAMETER">N</replaceable> | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable class="PARAMETER">table_name</replaceable> ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 </synopsis>
@@ -130,6 +130,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">
    </varlistentry>
 
    <varlistentry>
+    <term><literal>PARALLEL <replaceable class="PARAMETER">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable class="PARAMETER">N
+      </replaceable> background workers. Collecting garbage on table is processed
+      in block-level parallel. For tables with indexes, parallel vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a index are processed
+      by particular parallel vacuum worker. This option can not use with <literal>FULL</>
+      option.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>DISABLE_PAGE_SKIPPING</literal></term>
     <listitem>
      <para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 011f2b9..4ee9d88 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -89,7 +89,6 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
 						bool is_bitmapscan,
 						bool is_samplescan,
 						bool temp_snap);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1668,7 +1667,7 @@ heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
  *		first backend gets an InvalidBlockNumber return.
  * ----------------
  */
-static BlockNumber
+BlockNumber
 heap_parallelscan_nextpage(HeapScanDesc scan)
 {
 	BlockNumber page = InvalidBlockNumber;
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 17b1038..700286c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -19,6 +19,7 @@
 #include "access/xlog.h"
 #include "catalog/namespace.h"
 #include "commands/async.h"
+#include "commands/vacuum.h"
 #include "executor/execParallel.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -118,6 +119,9 @@ static const struct
 {
 	{
 		"ParallelQueryMain", ParallelQueryMain
+	},
+	{
+		"LazyVacuumWorkerMain", LazyVacuumWorkerMain
 	}
 };
 
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa1812..505f0fe 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -72,7 +72,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
 				  MultiXactId lastSaneMinMulti);
-static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
+static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options,
 		   VacuumParams *params);
 
 /*
@@ -87,17 +87,17 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	VacuumParams params;
 
 	/* sanity checks on options */
-	Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
-	Assert((vacstmt->options & VACOPT_VACUUM) ||
-		   !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
-	Assert((vacstmt->options & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
-	Assert(!(vacstmt->options & VACOPT_SKIPTOAST));
+	Assert(vacstmt->options.flags & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert((vacstmt->options.flags & VACOPT_VACUUM) ||
+		   !(vacstmt->options.flags & (VACOPT_FULL | VACOPT_FREEZE)));
+	Assert((vacstmt->options.flags & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
+	Assert(!(vacstmt->options.flags & VACOPT_SKIPTOAST));
 
 	/*
 	 * All freeze ages are zero if the FREEZE option is given; otherwise pass
 	 * them as -1 which means to use the default values.
 	 */
-	if (vacstmt->options & VACOPT_FREEZE)
+	if (vacstmt->options.flags & VACOPT_FREEZE)
 	{
 		params.freeze_min_age = 0;
 		params.freeze_table_age = 0;
@@ -146,7 +146,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
  * memory context that will not disappear at transaction commit.
  */
 void
-vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
+vacuum(VacuumOptions options, RangeVar *relation, Oid relid, VacuumParams *params,
 	   List *va_cols, BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	const char *stmttype;
@@ -157,7 +157,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 
 	Assert(params != NULL);
 
-	stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
+	stmttype = (options.flags & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
 
 	/*
 	 * We cannot run VACUUM inside a user transaction block; if we were inside
@@ -167,7 +167,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 *
 	 * ANALYZE (without VACUUM) can run either way.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 	{
 		PreventTransactionChain(isTopLevel, stmttype);
 		in_outer_xact = false;
@@ -189,17 +189,26 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
-	if ((options & VACOPT_FULL) != 0 &&
-		(options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
 
 	/*
+	 * Sanity check PARALLEL option.
+	 */
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_PARALLEL) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("VACUUM option PARALLEL cannnot be used with FULL")));
+
+	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
 	 */
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 		pgstat_vacuum_stat();
 
 	/*
@@ -245,11 +254,11 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 * transaction block, and also in an autovacuum worker, use own
 	 * transactions so we can release locks sooner.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 		use_own_xacts = true;
 	else
 	{
-		Assert(options & VACOPT_ANALYZE);
+		Assert(options.flags & VACOPT_ANALYZE);
 		if (IsAutoVacuumWorkerProcess())
 			use_own_xacts = true;
 		else if (in_outer_xact)
@@ -299,13 +308,13 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		{
 			Oid			relid = lfirst_oid(cur);
 
-			if (options & VACOPT_VACUUM)
+			if (options.flags & VACOPT_VACUUM)
 			{
 				if (!vacuum_rel(relid, relation, options, params))
 					continue;
 			}
 
-			if (options & VACOPT_ANALYZE)
+			if (options.flags & VACOPT_ANALYZE)
 			{
 				/*
 				 * If using separate xacts, start one for analyze. Otherwise,
@@ -318,7 +327,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 					PushActiveSnapshot(GetTransactionSnapshot());
 				}
 
-				analyze_rel(relid, relation, options, params,
+				analyze_rel(relid, relation, options.flags, params,
 							va_cols, in_outer_xact, vac_strategy);
 
 				if (use_own_xacts)
@@ -354,7 +363,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		StartTransactionCommand();
 	}
 
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 	{
 		/*
 		 * Update pg_database.datfrozenxid, and truncate pg_xact if possible.
@@ -1221,7 +1230,7 @@ vac_truncate_clog(TransactionId frozenXID,
  *		At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options, VacuumParams *params)
 {
 	LOCKMODE	lmode;
 	Relation	onerel;
@@ -1242,7 +1251,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 */
 	PushActiveSnapshot(GetTransactionSnapshot());
 
-	if (!(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_FULL))
 	{
 		/*
 		 * In lazy vacuum, we can set the PROC_IN_VACUUM flag, which lets
@@ -1282,7 +1291,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
 	 * way, we can be sure that no other backend is vacuuming the same table.
 	 */
-	lmode = (options & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+	lmode = (options.flags & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * Open the relation and get the appropriate lock on it.
@@ -1293,7 +1302,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * If we've been asked not to wait for the relation lock, acquire it first
 	 * in non-blocking mode, before calling try_relation_open().
 	 */
-	if (!(options & VACOPT_NOWAIT))
+	if (!(options.flags & VACOPT_NOWAIT))
 		onerel = try_relation_open(relid, lmode);
 	else if (ConditionalLockRelationOid(relid, lmode))
 		onerel = try_relation_open(relid, NoLock);
@@ -1412,7 +1421,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * us to process it.  In VACUUM FULL, though, the toast table is
 	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 	 */
-	if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_SKIPTOAST) && !(options.flags & VACOPT_FULL))
 		toast_relid = onerel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -1431,7 +1440,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (options & VACOPT_FULL)
+	if (options.flags & VACOPT_FULL)
 	{
 		/* close relation before vacuuming, but hold lock until commit */
 		relation_close(onerel, NoLock);
@@ -1439,7 +1448,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, false,
-					(options & VACOPT_VERBOSE) != 0);
+					(options.flags & VACOPT_VERBOSE) != 0);
 	}
 	else
 		lazy_vacuum_rel(onerel, options, params, vac_strategy);
@@ -1493,8 +1502,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
  * hit dangling index pointers.
  */
 void
-vac_open_indexes(Relation relation, LOCKMODE lockmode,
-				 int *nindexes, Relation **Irel)
+vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
 {
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index fabb2f8..fb61c4b 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -22,6 +22,20 @@
  * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * In PostgreSQL 10, we support parallel option for lazy vacuum. In parallel
+ * lazy vacuum multiple vacuum worker processes get blocks in parallel using
+ * parallel heap scan and process them. If table with indexes the parallel
+ * vacuum workersvacuum the heap and indexes in parallel.  Also, since dead
+ * tuple TIDs is shared with all vacuum processes including the leader process
+ * the parallel vacuum processes have to make two synchronization points in
+ * lazy vacuum processing: before starting vacuum and before clearing dead
+ * tuple TIDs. In those two points the leader treats dead tuple TIDs as an
+ * arbiter. The information required by parallel lazy vacuum such as the
+ * statistics of table, parallel heap scan description have to be shared with
+ * all vacuum processes, and table statistics are funneled by the leader
+ * process after finished. However, dead tuple TIDs need to be shared only
+ * when the table has indexes. FOr table with no indexes, each parallel worker
+ * processes blocks and vacuum them independently.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -41,8 +55,10 @@
 #include "access/heapam_xlog.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -54,6 +70,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "utils/lsyscache.h"
@@ -62,6 +79,7 @@
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
 
+//#define PLV_TIME
 
 /*
  * Space/time tradeoff parameters: do these need to be user-tunable?
@@ -103,10 +121,81 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+/* DSM key for parallel lazy vacuum */
+#define VACUUM_KEY_PARALLEL_SCAN	UINT64CONST(0xFFFFFFFFFFF00001)
+#define VACUUM_KEY_VACUUM_STATS		UINT64CONST(0xFFFFFFFFFFF00002)
+#define VACUUM_KEY_INDEX_STATS	    UINT64CONST(0xFFFFFFFFFFF00003)
+#define VACUUM_KEY_DEAD_TUPLE_CTL	UINT64CONST(0xFFFFFFFFFFF00004)
+#define VACUUM_KEY_DEAD_TUPLES		UINT64CONST(0xFFFFFFFFFFF00005)
+#define VACUUM_KEY_PARALLEL_STATE	UINT64CONST(0xFFFFFFFFFFF00006)
+
+/*
+ * see note of lazy_scan_heap_get_nextpage about forcing scanning of
+ * last page
+ */
+#define FORCE_CHECK_PAGE(blk) \
+	(blkno == (blk - 1) && should_attempt_truncation(vacrelstats))
+
+/* Check if given index is assigned to this parallel vacuum worker */
+#define IsAssignedIndex(i, pstate) \
+	(pstate == NULL || \
+	 (((i) % ((LVParallelState *) (pstate))->nworkers -1 ) == ParallelWorkerNumber))
+
+#define IsDeadTupleShared(lvstate) \
+	((LVState *)(lvstate))->parallel_mode && \
+	((LVState *)(lvstate))->vacrelstats->nindexes > 0
+
+/* Vacuum worker state for parallel lazy vacuum */
+#define VACSTATE_SCAN			0x1	/* heap scan phase */
+#define VACSTATE_VACUUM			0x2	/* vacuuming on table and index */
+
+/*
+ * Vacuum relevant options and thresholds we need share with parallel
+ * vacuum workers.
+ */
+typedef struct VacuumInfo
+{
+	int				options;	/* VACUUM optoins */
+	bool			aggressive;	/* does each worker need to aggressive vacuum? */
+	TransactionId	oldestxmin;
+	TransactionId	freezelimit;
+	MultiXactId		multixactcutoff;
+	int				elevel;
+} VacuumInfo;
+
+/* Struct for index statistics that are used for parallel lazy vacuum */
+typedef struct LVIndStats
+{
+	bool		updated;	/* need to be updated? */
+	BlockNumber	num_pages;
+	BlockNumber	num_tuples;
+} LVIndStats;
+
+/* Struct for parallel lazy vacuum state */
+typedef struct LVParallelState
+{
+	int nworkers;			/* # of process doing vacuum */
+	VacuumInfo	info;
+	int	state;				/* current parallel vacuum status */
+	int	finish_count;
+	ConditionVariable cv;
+	slock_t	mutex;
+} LVParallelState;
+
+/* Struct for control dead tuple TIDs array */
+typedef struct LVDeadTupleCtl
+{
+	int			dt_max;	/* # slots allocated in array */
+	int 		dt_count; /* # of dead tuple */
+
+	/* Used only for parallel lazy vacuum */
+	int			dt_index;
+	slock_t 	mutex;
+} LVDeadTupleCtl;
+
 typedef struct LVRelStats
 {
-	/* hasindex = true means two-pass strategy; false means one-pass */
-	bool		hasindex;
+	int			nindexes; /* > 0 means two-pass strategy; = 0 means one-pass */
 	/* Overall statistics about rel */
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
@@ -118,19 +207,46 @@ typedef struct LVRelStats
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
 	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
 	double		tuples_deleted;
-	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
 	int			num_index_scans;
+	BlockNumber pages_removed;
+	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+/* Struct for lazy vacuum execution */
+typedef struct LVState
+{
+	bool		parallel_mode;
+	LVRelStats *vacrelstats;
+	/*
+	 * Used when both parallel and non-parallel lazy vacuum, but in parallel
+	 * lazy vauum and table with index, dtctl points to a dynamic shared memory
+	 * and controlled by dtctl struct.
+	 */
+	LVDeadTupleCtl	*dtctl;
+	ItemPointer	deadtuples;
+
+	/* Used only for parallel lazy vacuum */
+	ParallelContext *pcxt;
+	LVParallelState *pstate;
+	ParallelHeapScanDesc pscan;
+	LVIndStats *indstats;
+} LVState;
+
+/*
+ * Scan description data for lazy vacuum. In parallel lazy vacuum,
+ * we use only heapscan instead.
+ */
+typedef struct LVScanDescData
+{
+	BlockNumber lv_cblock;					/* current scanning block number */
+	BlockNumber lv_next_unskippable_block;	/* next block number we cannot skip */
+	BlockNumber lv_nblocks;					/* the number blocks of relation */
+	HeapScanDesc heapscan;					/* field for parallel lazy vacuum */
+} LVScanDescData;
+typedef struct LVScanDescData *LVScanDesc;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -141,32 +257,47 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
-
-/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
-			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
-			   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+/* nonf-export function prototypes */
+static void lazy_vacuum_heap(Relation onerel, LVState *lvstate);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats);
-static void lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats);
+							  LVState *lvstate);
+static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
 						 LVRelStats *vacrelstats);
-static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
-static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr);
+static void lazy_space_alloc(LVState *lvstate, BlockNumber relblocks);
+static void lazy_record_dead_tuple(LVState *state, ItemPointer itemptr);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static void do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irels,
+							  int nindexes, int options, bool aggressive);
+static void lazy_scan_heap(Relation rel, LVState *lvstate, VacuumOptions options,
+						   bool aggressive);
+
+/* function prototypes for parallel vacuum */
+static void lazy_gather_vacuum_stats(ParallelContext *pxct,
+									 LVRelStats *valrelstats);
+static void lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats);
+static void lazy_initialize_dsm(ParallelContext *pcxt, Relation onrel,
+								LVState *lvstate, int options, bool aggressive);
+static LVState *lazy_initialize_worker(shm_toc *toc);
+static LVScanDesc lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan);
+static void lv_endscan(LVScanDesc lvscan);
+static BlockNumber lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+											   LVScanDesc lvscan,
+											   bool *all_visible_according_to_vm,
+											   Buffer *vmbuffer, int options, bool aggressive);
+static void lazy_prepare_vacuum(LVState *lvstate);
+static void lazy_end_vacuum(LVState *lvstate);
+static long lazy_get_max_dead_tuples(LVRelStats *vacrelstats);
 
 
 /*
@@ -179,12 +310,11 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf,
  *		and locked the relation.
  */
 void
-lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+lazy_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
 				BufferAccessStrategy bstrategy)
 {
-	LVRelStats *vacrelstats;
-	Relation   *Irel;
-	int			nindexes;
+	LVState		*lvstate;
+	LVRelStats	*vacrelstats;
 	PGRUsage	ru0;
 	TimestampTz starttime = 0;
 	long		secs;
@@ -211,7 +341,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		starttime = GetCurrentTimestamp();
 	}
 
-	if (options & VACOPT_VERBOSE)
+	if (options.flags & VACOPT_VERBOSE)
 		elevel = INFO;
 	else
 		elevel = DEBUG2;
@@ -239,10 +369,12 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 											   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 											  mxactFullScanLimit);
-	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+	if (options.flags & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
+	lvstate = (LVState *) palloc0(sizeof(LVState));
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
+	lvstate->vacrelstats = vacrelstats;
 
 	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
 	vacrelstats->old_rel_tuples = onerel->rd_rel->reltuples;
@@ -250,15 +382,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
-	/* Open all indexes of the relation */
-	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
-	vacrelstats->hasindex = (nindexes > 0);
-
 	/* Do the vacuuming */
-	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
-
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
+	lazy_scan_heap(onerel, lvstate, options, aggressive);
 
 	/*
 	 * Compute whether we actually scanned the all unfrozen pages. If we did,
@@ -267,7 +392,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	 * NB: We need to check this before truncating the relation, because that
 	 * will change ->rel_pages.
 	 */
-	if ((vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
+	if ((lvstate->vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
 		< vacrelstats->rel_pages)
 	{
 		Assert(!aggressive);
@@ -329,7 +454,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						new_rel_pages,
 						new_rel_tuples,
 						new_rel_allvisible,
-						vacrelstats->hasindex,
+						(vacrelstats->nindexes != 0),
 						new_frozen_xid,
 						new_min_multi,
 						false);
@@ -439,28 +564,164 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
- *	lazy_scan_heap() -- scan an open heap relation
+ * If the number of workers is specified more than 0, we enter the parallel lazy
+ * vacuum mode,. In parallel mode, we initialize dynamic shared memories and launch
+ * the parallel vacuum workers. The launcher process also vacuums the table after
+ * launched and then waits for the all vacuum workers to finish. After all vacuum
+ * workers finished we gather the vacuum statistics of table and indexes, and update
+ * them.
+ */
+static void
+lazy_scan_heap(Relation onerel, LVState *lvstate, VacuumOptions options,
+			   bool aggressive)
+{
+	ParallelContext	*pcxt;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+	Relation	*Irel;
+	int			nindexes;
+
+	lvstate->parallel_mode = options.nworkers > 0;
+
+	/* Open indexes */
+	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
+	vacrelstats->nindexes = nindexes;
+
+	if (lvstate->parallel_mode)
+	{
+		EnterParallelMode();
+
+		/* Create parallel context and initialize it */
+		pcxt = CreateParallelContext("postgres", "LazyVacuumWorkerMain",
+									 options.nworkers);
+		lvstate->pcxt = pcxt;
+
+		/* Estimate DSM size for parallel vacuum */
+		lazy_estimate_dsm(pcxt, lvstate->vacrelstats);
+
+		/* Initialize DSM for parallel vacuum */
+		InitializeParallelDSM(pcxt);
+		lazy_initialize_dsm(pcxt, onerel, lvstate, options.flags, aggressive);
+
+		/* Launch workers */
+		LaunchParallelWorkers(pcxt);
+	}
+
+	do_lazy_scan_heap(lvstate, onerel, Irel, nindexes, options.flags, aggressive);
+
+	/*
+	 * We can update relation statistics such as scanned page after gathered
+	 * statistics from all workers. Also, in parallel mode since we cannot update
+	 * index statistics at the same time the leader process have to do it.
+	 *
+	 * XXX : If we allows workers to update statistics tuples at the same time
+	 * the updating index statistics can be done in lazy_cleanup_index().
+	 */
+	if (lvstate->parallel_mode)
+	{
+		int i;
+		LVIndStats *indstats = palloc(sizeof(LVIndStats) * lvstate->vacrelstats->nindexes);
+
+		/* Wait for workers finished vacuum */
+		WaitForParallelWorkersToFinish(pcxt);
+
+		/* Gather the result of vacuum statistics from all workers */
+		lazy_gather_vacuum_stats(pcxt, vacrelstats);
+
+		/* Now we can compute the new value for pg_class.reltuples */
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 vacrelstats->rel_pages,
+															 vacrelstats->scanned_pages,
+															 vacrelstats->scanned_tuples);
+
+		/* Copy new index stats to local memory */
+		memcpy(indstats, lvstate->indstats, sizeof(LVIndStats) * vacrelstats->nindexes);
+
+		DestroyParallelContext(pcxt);
+		ExitParallelMode();
+
+		/* After exit parallel mode, update index statistics */
+		for (i = 0; i < vacrelstats->nindexes; i++)
+		{
+			Relation	ind = Irel[i];
+			LVIndStats *indstat = (LVIndStats *) &(indstats[i]);
+
+			if (indstat->updated)
+			   vac_update_relstats(ind,
+								   indstat->num_pages,
+								   indstat->num_tuples,
+								   0,
+								   false,
+								   InvalidTransactionId,
+								   InvalidMultiXactId,
+								   false);
+		}
+	}
+
+	vac_close_indexes(nindexes, Irel, RowExclusiveLock);
+}
+
+/*
+ * Entry point of parallel vacuum worker.
+ */
+void
+LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc)
+{
+	LVState		*lvstate;
+	Relation rel;
+	Relation *indrel;
+	int nindexes_worker;
+
+	/* Look up dynamic shared memory and initialize */
+	lvstate = lazy_initialize_worker(toc);
+
+	Assert(lvstate != NULL);
+
+	rel = relation_open(lvstate->pscan->phs_relid, ShareUpdateExclusiveLock);
+
+	/* Open all indexes */
+	vac_open_indexes(rel, RowExclusiveLock, &nindexes_worker,
+					 &indrel);
+
+	/* Do lazy vacuum */
+	do_lazy_scan_heap(lvstate, rel, indrel, lvstate->vacrelstats->nindexes,
+					  lvstate->pstate->info.options, lvstate->pstate->info.aggressive);
+
+	vac_close_indexes(lvstate->vacrelstats->nindexes, indrel, RowExclusiveLock);
+	heap_close(rel, ShareUpdateExclusiveLock);
+}
+
+/*
+ *	do_lazy_scan_heap() -- scan an open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
  *		things truncate dead tuples to dead line pointers, defragment the
- *		page, and set commit status bits (see heap_page_prune).  It also builds
+ *		page, and set commit status bits (see heap_page_prune).  It also uses
  *		lists of dead tuples and pages with free space, calculates statistics
  *		on the number of live tuples in the heap, and marks pages as
  *		all-visible if appropriate.  When done, or when we run low on space for
- *		dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
- *		to reclaim dead line pointers.
+ *		dead-tuple TIDs, invoke vacuuming of assigned indexes and call lazy_vacuum_heap
+ *		to reclaim dead line pointers. In parallel vacuum, we need to synchronize
+ *		at where scanning heap finished and vacuuming heap finished. The vacuum
+ *		worker reached to that point first need to wait for other vacuum workers
+ *		reached to the same point.
+ *
+ *		In parallel lazy scan, pscan is not NULL and we get next page number
+ *		using parallel heap scan. We make two synchronization points at where
+ *		before reclaiming dead tuple actually and after reclaimed them.
  *
  *		If there are no indexes then we can reclaim line pointers on the fly;
  *		dead line pointers need only be retained until all index pointers that
  *		reference them have been killed.
  */
 static void
-lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive)
+do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irel,
+				  int nindexes, int options, bool aggressive)
 {
-	BlockNumber nblocks,
-				blkno;
+	LVRelStats *vacrelstats = lvstate->vacrelstats;
+	BlockNumber blkno;
+	BlockNumber nblocks;
 	HeapTupleData tuple;
+	LVScanDesc lvscan;
 	char	   *relname;
 	BlockNumber empty_pages,
 				vacuumed_pages;
@@ -471,11 +732,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	IndexBulkDeleteResult **indstats;
 	int			i;
 	PGRUsage	ru0;
+#ifdef PLV_TIME
+	PGRUsage	ru_scan;
+	PGRUsage	ru_vacuum;
+#endif
 	Buffer		vmbuffer = InvalidBuffer;
-	BlockNumber next_unskippable_block;
-	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
+	bool		all_visible_according_to_vm = false;
+
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -504,89 +769,25 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
-	lazy_space_alloc(vacrelstats, nblocks);
+	lazy_space_alloc(lvstate, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
+	/* Begin heap scan for vacuum */
+	lvscan = lv_beginscan(onerel, lvstate->pscan);
+
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = lvstate->dtctl->dt_max;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/*
-	 * Except when aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-	 * sequentially, the OS should be doing readahead for us, so there's no
-	 * gain in skipping a page now and then; that's likely to disable
-	 * readahead and so be counterproductive. Also, skipping even a single
-	 * page means that we can't update relfrozenxid, so we only want to do it
-	 * if we can skip a goodly number of pages.
-	 *
-	 * When aggressive is set, we can't skip pages just because they are
-	 * all-visible, but we can still skip pages that are all-frozen, since
-	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
-	 *
-	 * Before entering the main loop, establish the invariant that
-	 * next_unskippable_block is the next block number >= blkno that we
-	 * can't skip based on the visibility map, either all-visible for a
-	 * regular scan or all-frozen for an aggressive scan.  We set it to
-	 * nblocks if there's no such block.  We also set up the skipping_blocks
-	 * flag correctly at this stage.
-	 *
-	 * Note: The value returned by visibilitymap_get_status could be slightly
-	 * out-of-date, since we make this test before reading the corresponding
-	 * heap page or locking the buffer.  This is OK.  If we mistakenly think
-	 * that the page is all-visible or all-frozen when in fact the flag's just
-	 * been cleared, we might fail to vacuum the page.  It's easy to see that
-	 * skipping a page when aggressive is not set is not a very big deal; we
-	 * might leave some dead tuples lying around, but the next vacuum will
-	 * find them.  But even when aggressive *is* set, it's still OK if we miss
-	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
-	 * just added to that page are necessarily newer than the GlobalXmin we
-	 * computed, so they'll have no effect on the value to which we can safely
-	 * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
-	 *
-	 * We will scan the table's last page, at least to the extent of
-	 * determining whether it has tuples or not, even if it should be skipped
-	 * according to the above rules; except when we've already determined that
-	 * it's not worth trying to truncate the table.  This avoids having
-	 * lazy_truncate_heap() take access-exclusive lock on the table to attempt
-	 * a truncation that just fails immediately because there are tuples in
-	 * the last page.  This is worth avoiding mainly because such a lock must
-	 * be replayed on any hot standby, where it can be disruptive.
-	 */
-	next_unskippable_block = 0;
-	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-	{
-		while (next_unskippable_block < nblocks)
-		{
-			uint8		vmstatus;
-
-			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-												&vmbuffer);
-			if (aggressive)
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-					break;
-			}
-			else
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-					break;
-			}
-			vacuum_delay_point();
-			next_unskippable_block++;
-		}
-	}
+#ifdef PLV_TIME
+	pg_rusage_init(&ru_scan);
+#endif
 
-	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
-		skipping_blocks = true;
-	else
-		skipping_blocks = false;
-
-	for (blkno = 0; blkno < nblocks; blkno++)
+	while((blkno = lazy_scan_get_nextpage(onerel, lvstate, lvscan,
+										  &all_visible_according_to_vm,
+										  &vmbuffer, options, aggressive)) != InvalidBlockNumber)
 	{
 		Buffer		buf;
 		Page		page;
@@ -597,99 +798,31 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		int			prev_dead_count;
 		int			nfrozen;
 		Size		freespace;
-		bool		all_visible_according_to_vm = false;
 		bool		all_visible;
 		bool		all_frozen = true;	/* provided all_visible is also true */
 		bool		has_dead_tuples;
 		TransactionId visibility_cutoff_xid = InvalidTransactionId;
-
-		/* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
-		(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
+		int			dtmax;
+		int			dtcount;
 
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
-		if (blkno == next_unskippable_block)
-		{
-			/* Time to advance next_unskippable_block */
-			next_unskippable_block++;
-			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-			{
-				while (next_unskippable_block < nblocks)
-				{
-					uint8		vmskipflags;
-
-					vmskipflags = visibilitymap_get_status(onerel,
-														   next_unskippable_block,
-														   &vmbuffer);
-					if (aggressive)
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-							break;
-					}
-					else
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-							break;
-					}
-					vacuum_delay_point();
-					next_unskippable_block++;
-				}
-			}
-
-			/*
-			 * We know we can't skip the current block.  But set up
-			 * skipping_blocks to do the right thing at the following blocks.
-			 */
-			if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
-				skipping_blocks = true;
-			else
-				skipping_blocks = false;
-
-			/*
-			 * Normally, the fact that we can't skip this block must mean that
-			 * it's not all-visible.  But in an aggressive vacuum we know only
-			 * that it's not all-frozen, so it might still be all-visible.
-			 */
-			if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
-				all_visible_according_to_vm = true;
-		}
-		else
-		{
-			/*
-			 * The current block is potentially skippable; if we've seen a
-			 * long enough run of skippable blocks to justify skipping it, and
-			 * we're not forced to check it, then go ahead and skip.
-			 * Otherwise, the page must be at least all-visible if not
-			 * all-frozen, so we can set all_visible_according_to_vm = true.
-			 */
-			if (skipping_blocks && !FORCE_CHECK_PAGE())
-			{
-				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was all-frozen, so we have to recheck; but
-				 * in this case an approximate answer is OK.
-				 */
-				if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
-					vacrelstats->frozenskipped_pages++;
-				continue;
-			}
-			all_visible_according_to_vm = true;
-		}
-
 		vacuum_delay_point();
 
 		/*
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if (IsDeadTupleShared(lvstate))
+			SpinLockAcquire(&lvstate->dtctl->mutex);
+
+		dtmax = lvstate->dtctl->dt_max;
+		dtcount = lvstate->dtctl->dt_count;
+
+		if (IsDeadTupleShared(lvstate))
+			SpinLockRelease(&lvstate->dtctl->mutex);
+
+		if (((dtmax - dtcount) < MaxHeapTuplesPerPage) && dtcount > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -697,6 +830,19 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			};
 			int64		hvp_val[2];
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+			/*
+			 * Here we're about to vacuum the table and indexes actually. Before
+			 * entering vacuum state, we have to wait for other vacuum worker to
+			 * reach here.
+			 */
+			lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_vacuum);
+#endif
+
 			/*
 			 * Before beginning index vacuuming, we release any pin we may
 			 * hold on the visibility map page.  This isn't necessary for
@@ -716,11 +862,12 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
 
-			/* Remove index entries */
+			/* Remove assigned index entries */
 			for (i = 0; i < nindexes; i++)
-				lazy_vacuum_index(Irel[i],
-								  &indstats[i],
-								  vacrelstats);
+			{
+				if (IsAssignedIndex(i, lvstate->pstate))
+					lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+			}
 
 			/*
 			 * Report that we are now vacuuming the heap.  We also increase
@@ -733,19 +880,28 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
 
 			/* Remove tuples from heap */
-			lazy_vacuum_heap(onerel, vacrelstats);
+			lazy_vacuum_heap(onerel, lvstate);
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 			/*
-			 * Forget the now-vacuumed tuples, and press on, but be careful
-			 * not to reset latestRemovedXid since we want that value to be
-			 * valid.
+			 * Here we've done vacuum on the heap and index and we are going
+			 * to begin the next round scan on heap. Wait until all vacuum worker
+			 * finished vacuum. After all vacuum workers finished, forget the
+			 * now-vacuumed tuples, and press on, but be careful not to reset
+			 * latestRemoveXid since we want that value to be valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
-			vacrelstats->num_index_scans++;
+			lazy_end_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_scan);
+#endif
 
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			vacrelstats->num_index_scans++;
 		}
 
 		/*
@@ -771,7 +927,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * it's OK to skip vacuuming pages we get a lock conflict on. They
 			 * will be dealt with in some future vacuum.
 			 */
-			if (!aggressive && !FORCE_CHECK_PAGE())
+			if (!aggressive && !FORCE_CHECK_PAGE(blkno))
 			{
 				ReleaseBuffer(buf);
 				vacrelstats->pinskipped_pages++;
@@ -923,7 +1079,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = lvstate->dtctl->dt_count;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -962,7 +1118,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 */
 			if (ItemIdIsDead(itemid))
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				all_visible = false;
 				continue;
 			}
@@ -1067,7 +1223,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			if (tupgone)
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
 													   &vacrelstats->latestRemovedXid);
 				tups_vacuumed += 1;
@@ -1132,13 +1288,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/*
 		 * If there are no indexes then we can vacuum the page right now
-		 * instead of doing a second scan.
+		 * instead of doing a second scan. Because each parallel worker uses its
+		 * own dead tuple area they can vacuum independently.
 		 */
-		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+		if (Irel == NULL && lvstate->dtctl->dt_count > 0)
 		{
 			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+			lazy_vacuum_page(onerel, blkno, buf, 0, lvstate, &vmbuffer);
 			has_dead_tuples = false;
 
 			/*
@@ -1146,7 +1302,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lvstate->dtctl->dt_count = 0;
+
 			vacuumed_pages++;
 		}
 
@@ -1249,7 +1406,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (lvstate->dtctl->dt_count == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
@@ -1264,10 +1421,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_dead_tuples = nkeep;
 
 	/* now we can compute the new value for pg_class.reltuples */
-	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
-														 nblocks,
-														 vacrelstats->tupcount_pages,
-														 num_tuples);
+	if (!lvstate->parallel_mode)
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 nblocks,
+															 vacrelstats->tupcount_pages,
+															 num_tuples);
 
 	/*
 	 * Release any remaining pin on visibility map page.
@@ -1280,13 +1438,25 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (lvstate->dtctl->dt_count > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
 			PROGRESS_VACUUM_NUM_INDEX_VACUUMS
 		};
 		int64		hvp_val[2];
+#ifdef PLV_TIME
+		elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+		/*
+		 * Here we're about to vacuum the table and indexes actually. Before
+		 * entering vacuum state, we have to wait for other vacuum worker to
+		 * reach here.
+		 */
+		lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+		pg_rusage_init(&ru_vacuum);
+#endif
 
 		/* Log cleanup info before we touch indexes */
 		vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -1297,9 +1467,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/* Remove index entries */
 		for (i = 0; i < nindexes; i++)
-			lazy_vacuum_index(Irel[i],
-							  &indstats[i],
-							  vacrelstats);
+		{
+			if (IsAssignedIndex(i, lvstate->pstate))
+				lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+		}
 
 		/* Report that we are now vacuuming the heap */
 		hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
@@ -1309,8 +1480,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		/* Remove tuples from heap */
 		pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 									 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
-		lazy_vacuum_heap(onerel, vacrelstats);
+
+		lazy_vacuum_heap(onerel, lvstate);
+
 		vacrelstats->num_index_scans++;
+#ifdef PLV_TIME
+		elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 	}
 
 	/* report all blocks vacuumed; and that we're cleaning up */
@@ -1320,7 +1496,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
-		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+	{
+		if (IsAssignedIndex(i, lvstate->pstate))
+			lazy_cleanup_index(Irel[i], indstats[i], lvstate->vacrelstats,
+							   lvstate->parallel_mode ? &(lvstate->indstats[i]) : NULL);
+	}
 
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
@@ -1329,12 +1509,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	lv_endscan(lvscan);
+
 	/*
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
 	 */
 	initStringInfo(&buf);
 	appendStringInfo(&buf,
+					 "------- worker %d TOTAL stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
 					 _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
 					 nkeep, OldestXmin);
 	appendStringInfo(&buf, _("There were %.0f unused item pointers.\n"),
@@ -1362,6 +1546,35 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pfree(buf.data);
 }
 
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+lazy_gather_vacuum_stats(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	int	i;
+	LVRelStats *lvstats_list;
+
+	lvstats_list = (LVRelStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_VACUUM_STATS, false);
+
+	/* Gather each worker stats */
+	for (i = 0; i < pcxt->nworkers_launched; i++)
+	{
+		LVRelStats *wstats = lvstats_list + sizeof(LVRelStats) * i;
+
+		vacrelstats->scanned_pages += wstats->scanned_pages;
+		vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+		vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+		vacrelstats->scanned_tuples += wstats->scanned_tuples;
+		vacrelstats->new_dead_tuples += wstats->new_dead_tuples;
+		vacrelstats->pages_removed += wstats->pages_removed;
+		vacrelstats->tuples_deleted += wstats->tuples_deleted;
+		vacrelstats->nonempty_pages += wstats->nonempty_pages;
+	}
+
+	/* all vacuum workers have same value of rel_pages */
+	vacrelstats->rel_pages = lvstats_list->rel_pages;
+}
 
 /*
  *	lazy_vacuum_heap() -- second pass over the heap
@@ -1375,18 +1588,23 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
  * process index entry removal in batches as large as possible.
  */
 static void
-lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
+lazy_vacuum_heap(Relation onerel, LVState *lvstate)
 {
 	int			tupindex;
 	int			npages;
 	PGRUsage	ru0;
+	BlockNumber	prev_tblk;
 	Buffer		vmbuffer = InvalidBuffer;
+	ItemPointer	deadtuples = lvstate->deadtuples;
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+	BlockNumber	ntuples = 0;
 
 	pg_rusage_init(&ru0);
 	npages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+
+	while (tupindex < dtctl->dt_count)
 	{
 		BlockNumber tblk;
 		Buffer		buf;
@@ -1395,7 +1613,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 
 		vacuum_delay_point();
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		if (IsDeadTupleShared(lvstate))
+		{
+			SpinLockAcquire(&(dtctl->mutex));
+
+			tupindex = dtctl->dt_index;
+
+			if (tupindex >= dtctl->dt_count)
+			{
+				SpinLockRelease(&(dtctl->mutex));
+				break;
+			}
+
+			/* Advance dt_index */
+			prev_tblk = tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
+			while(prev_tblk == tblk &&
+				  dtctl->dt_index < dtctl->dt_count)
+			{
+				tblk = ItemPointerGetBlockNumber(&deadtuples[dtctl->dt_index]);
+				dtctl->dt_index++;
+				ntuples++;
+			}
+
+			SpinLockRelease(&(dtctl->mutex));
+		}
+
+		tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
 		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
 								 vac_strategy);
 		if (!ConditionalLockBufferForCleanup(buf))
@@ -1404,7 +1647,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 			++tupindex;
 			continue;
 		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, lvstate,
 									&vmbuffer);
 
 		/* Now that we've compacted the page, record its available space */
@@ -1422,10 +1665,16 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 		vmbuffer = InvalidBuffer;
 	}
 
+#ifdef PLV_TIME
+	elog(WARNING, "%d TABLE %s", ParallelWorkerNumber, pg_rusage_show(&ru0));
+#endif
+	ereport(elevel,
+			(errmsg("------- worker %d VACUUM HEAP stats -------", ParallelWorkerNumber)));
+
 	ereport(elevel,
 			(errmsg("\"%s\": removed %d row versions in %d pages",
 					RelationGetRelationName(onerel),
-					tupindex, npages),
+					ntuples, npages),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1435,34 +1684,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
- * tuple for this page.  We assume the rest follow sequentially.
- * The return value is the first tupindex after the tuples of this page.
  */
 static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < lvstate->dtctl->dt_count; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&lvstate->deadtuples[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&lvstate->deadtuples[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1587,12 +1834,12 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
  *	lazy_vacuum_index() -- vacuum one index relation.
  *
  *		Delete all the index entries pointing to tuples listed in
- *		vacrelstats->dead_tuples, and update running statistics.
+ *		lvstate->deadtuples, and update running statistics.
  */
 static void
 lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats)
+				  LVState	*lvstate)
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
@@ -1603,17 +1850,22 @@ lazy_vacuum_index(Relation indrel,
 	ivinfo.analyze_only = false;
 	ivinfo.estimated_count = true;
 	ivinfo.message_level = elevel;
-	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
+	ivinfo.num_heap_tuples = lvstate->vacrelstats->old_rel_tuples;
 	ivinfo.strategy = vac_strategy;
 
 	/* Do bulk deletion */
-	*stats = index_bulk_delete(&ivinfo, *stats,
-							   lazy_tid_reaped, (void *) vacrelstats);
+	*stats = index_bulk_delete(&ivinfo, *stats, lazy_tid_reaped, (void *) lvstate);
 
+#ifdef PLV_TIME
+	elog(WARNING, "%d INDEX(%d) %s", ParallelWorkerNumber, RelationGetRelid(indrel),
+		 pg_rusage_show(&ru0));
+#endif
+	ereport(elevel,
+			(errmsg("------- worker %d VACUUM INDEX stats -------", ParallelWorkerNumber)));
 	ereport(elevel,
 			(errmsg("scanned index \"%s\" to remove %d row versions",
 					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+					lvstate->dtctl->dt_count),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1621,9 +1873,8 @@ lazy_vacuum_index(Relation indrel,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  */
 static void
-lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats)
+lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat)
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
@@ -1639,24 +1890,40 @@ lazy_cleanup_index(Relation indrel,
 
 	stats = index_vacuum_cleanup(&ivinfo, stats);
 
+	if (indstat)
+		indstat->updated = false;
+
 	if (!stats)
 		return;
 
 	/*
 	 * Now update statistics in pg_class, but only if the index says the count
-	 * is accurate.
+	 * is accurate. In parallel lazy vacuum, the worker can not update these
+	 * information by itself, so save to DSM and then the launcher process
+	 * updates it later.
 	 */
 	if (!stats->estimated_count)
-		vac_update_relstats(indrel,
-							stats->num_pages,
-							stats->num_index_tuples,
-							0,
-							false,
-							InvalidTransactionId,
-							InvalidMultiXactId,
-							false);
+	{
+		if (indstat)
+		{
+			indstat->updated = true;
+			indstat->num_pages = stats->num_pages;
+			indstat->num_tuples = stats->num_index_tuples;
+		}
+		else
+			vac_update_relstats(indrel,
+								stats->num_pages,
+								stats->num_index_tuples,
+								0,
+								false,
+								InvalidTransactionId,
+								InvalidMultiXactId,
+								false);
+	}
 
 	ereport(elevel,
+			(errmsg("------- worker %d CLEANUP INDEX stats -------", ParallelWorkerNumber)));
+	ereport(elevel,
 			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
 					RelationGetRelationName(indrel),
 					stats->num_index_tuples,
@@ -1976,59 +2243,63 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 /*
  * lazy_space_alloc - space allocation decisions for lazy vacuum
  *
+ * In parallel lazy vacuum the space for dead tuple locations are already
+ * allocated in dynamic shared memory, so we allocate space for dead tuple
+ * locations in local memory only when in not parallel lazy vacuum and set
+ * MyDeadTuple.
+ *
  * See the comments at the head of this file for rationale.
  */
 static void
-lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
+lazy_space_alloc(LVState *lvstate, BlockNumber relblocks)
 {
-	long		maxtuples;
-	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
-	autovacuum_work_mem != -1 ?
-	autovacuum_work_mem : maintenance_work_mem;
+	long maxtuples;
 
-	if (vacrelstats->hasindex)
-	{
-		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
-		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
-
-		/* curious coding here to ensure the multiplication can't overflow */
-		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
-			maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+	/*
+	 * In parallel mode, we already set the pointer to dead tuple
+	 * array when intialize.
+	 */
+	if (lvstate->parallel_mode && lvstate->vacrelstats->nindexes > 0)
+		return;
 
-		/* stay sane if small maintenance_work_mem */
-		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
-	}
-	else
-	{
-		maxtuples = MaxHeapTuplesPerPage;
-	}
+	maxtuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
 
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	/*
+	 * If in not parallel lazy vacuum, we need to allocate dead
+	 * tuple array in local memory.
+	 */
+	lvstate->deadtuples = palloc0(sizeof(ItemPointerData) * (int)maxtuples);
+	lvstate->dtctl = (LVDeadTupleCtl *) palloc(sizeof(LVDeadTupleCtl));
+	lvstate->dtctl->dt_max = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+	lvstate->dtctl->dt_count = 0;
 }
 
 /*
  * lazy_record_dead_tuple - remember one deletable tuple
  */
 static void
-lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr)
+lazy_record_dead_tuple(LVState *lvstate, ItemPointer itemptr)
 {
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockAcquire(&(dtctl->mutex));
+
 	/*
 	 * The array shouldn't overflow under normal behavior, but perhaps it
 	 * could if we are given a really small maintenance_work_mem. In that
 	 * case, just forget the last few tuples (we'll get 'em next time).
 	 */
-	if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
+	if (dtctl->dt_count < dtctl->dt_max)
 	{
-		vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-		vacrelstats->num_dead_tuples++;
-		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 vacrelstats->num_dead_tuples);
+
+		lvstate->deadtuples[dtctl->dt_count] = *itemptr;
+		(dtctl->dt_count)++;
+		/* XXX : Update progress information here */
 	}
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockRelease(&(dtctl->mutex));
 }
 
 /*
@@ -2041,16 +2312,27 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
 static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
-	LVRelStats *vacrelstats = (LVRelStats *) state;
+	LVState *lvstate = (LVState *) state;
 	ItemPointer res;
 
+	/*
+	 * In parallel lazy vacuum all dead tuple TID locations are stored into
+	 * dynamic shared memory together and entire dead tuple arrays is not
+	 * ordered. However since each dead tuple array corresponding vacuum
+	 * worker is ordered by TID location we can search 'num' times. Here
+	 * since no write happends vacuum worker access the dead tuple array
+	 * without holding lock.
+	 */
 	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
+								(void *) lvstate->deadtuples,
+								lvstate->dtctl->dt_count,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
 
-	return (res != NULL);
+	if (res != NULL)
+		return true;
+
+	return false;
 }
 
 /*
@@ -2194,3 +2476,601 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number we need to scan next, or InvalidBlockNumber if scan
+ * is done.
+ *
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages If we're not in parallel
+ * mode.  Since we're reading sequentially, the OS should be doing readahead
+ * for us, so there's no gain in skipping a page now and then; that's likely
+ * to disable readahead and so be counterproductive. Also, skipping even a
+ * single page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * In not parallel mode, before entering the main loop, establish the
+ * invariant that next_unskippable_block is the next block number >= blkno
+ * that's not we can't skip based on the visibility map, either all-visible
+ * for a regular scan or all-frozen for an aggressive scan.  We set it to
+ * nblocks if there's no such block.  We also set up the skipping_blocks
+ * flag correctly at this stage.
+ *
+ * In parallel mode, pstate is not NULL. We scan heap pages
+ * using parallel heap scan description. Each worker calls heap_parallelscan_nextpage()
+ * in order to exclusively get  block number we need to scan at next.
+ * If given block is all-visible according to visibility map, we skip to
+ * scan this block immediately unlike not parallel lazy scan.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer.  This is OK.  If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page.  It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them.  But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared.  Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table.  This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page.  This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static BlockNumber
+lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+					   LVScanDesc lvscan, bool *all_visible_according_to_vm,
+					   Buffer *vmbuffer, int options, bool aggressive)
+{
+	BlockNumber blkno;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+
+	if (lvstate->parallel_mode)
+	{
+		/*
+		 * In parallel lazy vacuum since it's hard to know how many consecutive
+		 * all-visible pages exits on table we skip to scan the heap page immediately.
+		 * if it is all-visible page.
+		 */
+		while ((blkno = heap_parallelscan_nextpage(lvscan->heapscan)) != InvalidBlockNumber)
+		{
+			*all_visible_according_to_vm = false;
+			vacuum_delay_point();
+
+			/* Consider to skip scan page according visibility map */
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0 &&
+				!FORCE_CHECK_PAGE(blkno))
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, blkno, vmbuffer);
+
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+					{
+						vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+					else if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+						*all_visible_according_to_vm = true;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+					{
+						if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+							vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+				}
+			}
+
+			/* We need to scan current blkno, break */
+			break;
+		}
+	}
+	else
+	{
+		bool skipping_blocks = false;
+
+		/* Initialize lv_nextunskippable_page if needed */
+		if (lvscan->lv_cblock == 0 && (options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+		{
+			while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, lvscan->lv_next_unskippable_block,
+													vmbuffer);
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+						break;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+						break;
+				}
+				vacuum_delay_point();
+				lvscan->lv_next_unskippable_block++;
+			}
+
+			if (lvscan->lv_next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+				skipping_blocks = true;
+			else
+				skipping_blocks = false;
+		}
+
+		/* Decide the block number we need to scan */
+		for (blkno = lvscan->lv_cblock; blkno < lvscan->lv_nblocks; blkno++)
+		{
+			if (blkno == lvscan->lv_next_unskippable_block)
+			{
+				/* Time to advance next_unskippable_block */
+				lvscan->lv_next_unskippable_block++;
+				if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+				{
+					while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+					{
+						uint8		vmstatus;
+
+						vmstatus = visibilitymap_get_status(onerel,
+															lvscan->lv_next_unskippable_block,
+															vmbuffer);
+						if (aggressive)
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+								break;
+						}
+						else
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+								break;
+						}
+						vacuum_delay_point();
+						lvscan->lv_next_unskippable_block++;
+					}
+				}
+
+				/*
+				 * We know we can't skip the current block.  But set up
+				 * skipping_all_visible_blocks to do the right thing at the
+				 * following blocks.
+				 */
+				if (lvscan->lv_next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
+					skipping_blocks = true;
+				else
+					skipping_blocks = false;
+
+				/*
+				 * Normally, the fact that we can't skip this block must mean that
+				 * it's not all-visible.  But in an aggressive vacuum we know only
+				 * that it's not all-frozen, so it might still be all-visible.
+				 */
+				if (aggressive && VM_ALL_VISIBLE(onerel, blkno, vmbuffer))
+					*all_visible_according_to_vm = true;
+
+				/* Found out that next unskippable block number */
+				break;
+			}
+			else
+			{
+				/*
+				 * The current block is potentially skippable; if we've seen a
+				 * long enough run of skippable blocks to justify skipping it, and
+				 * we're not forced to check it, then go ahead and skip.
+				 * Otherwise, the page must be at least all-visible if not
+				 * all-frozen, so we can set all_visible_according_to_vm = true.
+				 */
+				if (skipping_blocks && !FORCE_CHECK_PAGE(blkno))
+				{
+					/*
+					 * Tricky, tricky.  If this is in aggressive vacuum, the page
+					 * must have been all-frozen at the time we checked whether it
+					 * was skippable, but it might not be any more.  We must be
+					 * careful to count it as a skipped all-frozen page in that
+					 * case, or else we'll think we can't update relfrozenxid and
+					 * relminmxid.  If it's not an aggressive vacuum, we don't
+					 * know whether it was all-frozen, so we have to recheck; but
+					 * in this case an approximate answer is OK.
+					 */
+					if (aggressive || VM_ALL_FROZEN(onerel, blkno, vmbuffer))
+						vacrelstats->frozenskipped_pages++;
+					continue;
+				}
+
+				*all_visible_according_to_vm = true;
+
+				/* We need to scan current blkno, break */
+				break;
+			}
+		} /* for */
+
+		/* Advance the current block number for the next scan */
+		lvscan->lv_cblock = blkno + 1;
+	}
+
+	return (blkno == lvscan->lv_nblocks) ? InvalidBlockNumber : blkno;
+}
+
+/*
+ * Begin lazy vacuum scan. lvscan->heapscan is NULL if
+ * we're not in parallel lazy vacuum.
+ */
+static LVScanDesc
+lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan)
+{
+	LVScanDesc lvscan;
+
+	lvscan = (LVScanDesc) palloc(sizeof(LVScanDescData));
+
+	lvscan->lv_cblock = 0;
+	lvscan->lv_next_unskippable_block = 0;
+	lvscan->lv_nblocks = RelationGetNumberOfBlocks(onerel);
+
+	if (pscan != NULL)
+		lvscan->heapscan = heap_beginscan_parallel(onerel, pscan);
+	else
+		lvscan->heapscan = NULL;
+
+	return lvscan;
+}
+
+/*
+ * End lazy vacuum scan.
+ */
+static void
+lv_endscan(LVScanDesc lvscan)
+{
+	if (lvscan->heapscan != NULL)
+		heap_endscan(lvscan->heapscan);
+	pfree(lvscan);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Lazy Vacuum Support
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * Estimate storage for parallel lazy vacuum.
+ */
+static void
+lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	Size size = 0;
+	int keys = 0;
+	int vacuum_workers = pcxt->nworkers + 1;
+	long maxtuples = lazy_get_max_dead_tuples(vacrelstats);
+
+	/* Estimate size for parallel heap scan */
+	size += heap_parallelscan_estimate(SnapshotAny);
+	keys++;
+
+	/* Estimate size for vacuum statistics for only workers*/
+	size += BUFFERALIGN(mul_size(sizeof(LVRelStats), pcxt->nworkers));
+	keys++;
+
+	/* We have to share dead tuple information only when the table has index */
+	if (vacrelstats->nindexes > 0)
+	{
+		/* Estimate size for index statistics */
+		size += BUFFERALIGN(mul_size(sizeof(LVIndStats), vacrelstats->nindexes));
+		keys++;
+
+		/* Estimate size for daed tuple control */
+		size += BUFFERALIGN(sizeof(LVDeadTupleCtl));
+		keys++;
+
+		/* Estimate size for dead tuple array */
+		size += BUFFERALIGN(mul_size(
+							 mul_size(sizeof(ItemPointerData), maxtuples),
+							 vacuum_workers));
+		keys++;
+	}
+
+	/* Estimate size for parallel lazy vacuum state */
+	size += BUFFERALIGN(sizeof(LVParallelState));
+	keys++;
+
+	/* Estimate size for vacuum task */
+	size += BUFFERALIGN(sizeof(VacuumInfo));
+	keys++;
+
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, keys);
+}
+
+/*
+ * Initialize dynamic shared memory for parallel lazy vacuum. We store
+ * relevant informations of parallel heap scanning, dead tuple array
+ * and vacuum statistics for each worker and some parameters for
+ * lazy vacuum.
+ */
+static void
+lazy_initialize_dsm(ParallelContext *pcxt, Relation onerel, LVState *lvstate,
+					int options, bool aggressive)
+{
+	ParallelHeapScanDesc pscan_ptr;
+	ItemPointer	deadtuples_ptr;
+	char 		*lvrelstats_ptr;
+	LVParallelState *pstate_ptr;
+	LVIndStats	*indstats_ptr;
+	LVDeadTupleCtl	*dtctl_ptr;
+	int i;
+	int deadtuples_size;
+	int lvrelstats_size;
+	int	vacuum_workers = pcxt->nworkers + 1;
+	long max_tuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+
+	/* Allocate and initialize DSM for vacuum stats for each worker */
+	lvrelstats_size = mul_size(sizeof(LVRelStats), pcxt->nworkers);
+	lvrelstats_ptr = shm_toc_allocate(pcxt->toc, lvrelstats_size);
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		char *start;
+
+		start = lvrelstats_ptr + i * sizeof(LVRelStats);
+		memcpy(start, lvstate->vacrelstats, sizeof(LVRelStats));
+	}
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_STATS, lvrelstats_ptr);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Allocate and initialize DSM for dead tuple control */
+		dtctl_ptr = (LVDeadTupleCtl *) shm_toc_allocate(pcxt->toc, sizeof(LVDeadTupleCtl));
+		SpinLockInit(&(dtctl_ptr->mutex));
+		dtctl_ptr->dt_max = max_tuples * vacuum_workers;
+		dtctl_ptr->dt_count = 0;
+		dtctl_ptr->dt_index = 0;
+		lvstate->dtctl = dtctl_ptr;
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLE_CTL, dtctl_ptr);
+
+		/* Allocate and initialize DSM for dead tuple array */
+		deadtuples_size = mul_size(mul_size(sizeof(ItemPointerData), max_tuples),
+								   vacuum_workers);
+		deadtuples_ptr = (ItemPointer) shm_toc_allocate(pcxt->toc,
+														deadtuples_size);
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLES, deadtuples_ptr);
+		lvstate->deadtuples = deadtuples_ptr;
+
+		/* Allocate DSM for index statistics */
+		indstats_ptr = (LVIndStats *) shm_toc_allocate(pcxt->toc,
+													   mul_size(sizeof(LVIndStats),
+																lvstate->vacrelstats->nindexes));
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_INDEX_STATS, indstats_ptr);
+		lvstate->indstats = indstats_ptr;
+	}
+
+	/* Allocate and initialize DSM for parallel scan description */
+	pscan_ptr = (ParallelHeapScanDesc) shm_toc_allocate(pcxt->toc,
+														heap_parallelscan_estimate(SnapshotAny));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_SCAN, pscan_ptr);
+	heap_parallelscan_initialize(pscan_ptr, onerel, SnapshotAny);
+	lvstate->pscan = pscan_ptr;
+
+	/* Allocate and initialize DSM for parallel vacuum state */
+	pstate_ptr = (LVParallelState *) shm_toc_allocate(pcxt->toc, sizeof(LVParallelState));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_STATE, pstate_ptr);
+
+	ConditionVariableInit(&(pstate_ptr->cv));
+	SpinLockInit(&(pstate_ptr->mutex));
+	pstate_ptr->nworkers = vacuum_workers;
+	pstate_ptr->state = VACSTATE_SCAN;
+	pstate_ptr->info.aggressive = aggressive;
+	pstate_ptr->info.options = options;
+	pstate_ptr->info.oldestxmin = OldestXmin;
+	pstate_ptr->info.freezelimit = FreezeLimit;
+	pstate_ptr->info.multixactcutoff = MultiXactCutoff;
+	pstate_ptr->info.elevel = elevel;
+	lvstate->pstate = pstate_ptr;
+}
+
+/*
+ * Initialize parallel lazy vacuum for worker.
+ */
+static LVState *
+lazy_initialize_worker(shm_toc *toc)
+{
+	LVState	*lvstate;
+	char *lvstats;
+
+	lvstate = (LVState *) palloc(sizeof(LVState));
+	lvstate->parallel_mode = true;
+
+	/* Set up vacuum stats */
+	lvstats = shm_toc_lookup(toc, VACUUM_KEY_VACUUM_STATS, false);
+	lvstate->vacrelstats = (LVRelStats *) (lvstats +
+										   sizeof(LVRelStats) * ParallelWorkerNumber);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Set up daed tuple control */
+		lvstate->dtctl = (LVDeadTupleCtl *) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLE_CTL, false);
+
+		/* Set up dead tuple array */
+		lvstate->deadtuples = (ItemPointer) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLES, false);
+
+		/* Set up index statistics */
+		lvstate->indstats = (LVIndStats *) shm_toc_lookup(toc, VACUUM_KEY_INDEX_STATS, false);
+	}
+
+	/* Set up parallel vacuum state */
+	lvstate->pstate = (LVParallelState *) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_STATE, false);
+
+	/* Set up parallel heap scan description */
+	lvstate->pscan = (ParallelHeapScanDesc) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_SCAN, false);
+
+	/* Set up parameters for lazy vacuum */
+	OldestXmin = lvstate->pstate->info.oldestxmin;
+	FreezeLimit = lvstate->pstate->info.freezelimit;
+	MultiXactCutoff = lvstate->pstate->info.multixactcutoff;
+	elevel = lvstate->pstate->info.elevel;
+
+	return lvstate;
+}
+
+/*
+ * End vacuum
+ */
+static void
+lazy_end_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+	{
+		lvstate->dtctl->dt_count = 0;
+		return;
+	}
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		if (!counted)
+			pstate->finish_count++;
+		finish_count = pstate->finish_count;
+		state = pstate->state;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_SCAN)
+			break;
+
+		/* Wake up other workers if counted up */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			/* Clear dead tuples */
+			lvstate->dtctl->dt_count = 0;
+
+			/* need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_SCAN;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_DONE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Prepare vacuum
+ */
+static void
+lazy_prepare_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+		return;
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		if (!counted)
+			pstate->finish_count++;
+		state = pstate->state;
+		finish_count = pstate->finish_count;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_VACUUM)
+			break;
+
+		/* Wake up other workers if counted up */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * The leader process can change parallel vacuum state if all workers
+		 * have reached here.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			qsort((void *) lvstate->deadtuples, lvstate->dtctl->dt_count,
+				  sizeof(ItemPointerData), vac_cmp_itemptr);
+
+			/* need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_VACUUM;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_PREPARE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Return the number of maximum dead tuples can be stored according
+ * to vac_work_mem.
+ */
+static long
+lazy_get_max_dead_tuples(LVRelStats *vacrelstats)
+{
+	long maxtuples;
+	int	vac_work_mem = IsAutoVacuumWorkerProcess() &&
+		autovacuum_work_mem != -1 ?
+		autovacuum_work_mem : maintenance_work_mem;
+
+	if (vacrelstats->nindexes != 0)
+	{
+		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+		maxtuples = Min(maxtuples, INT_MAX);
+		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+
+		/* curious coding here to ensure the multiplication can't overflow */
+		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > vacrelstats->old_rel_pages)
+			maxtuples = vacrelstats->old_rel_pages * LAZY_ALLOC_TUPLES;
+
+		/* stay sane if small maintenance_work_mem */
+		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+	}
+	else
+	{
+		maxtuples = MaxHeapTuplesPerPage;
+	}
+
+	return maxtuples;
+}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 8d92c03..b258f62 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1662,7 +1662,12 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
 static bool
 _equalVacuumStmt(const VacuumStmt *a, const VacuumStmt *b)
 {
-	COMPARE_SCALAR_FIELD(options);
+	if (a->options.flags != b->options.flags)
+		return false;
+
+	if (a->options.nworkers != b->options.nworkers)
+		return false;
+
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_NODE_FIELD(va_cols);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4b1ce09..ac93d90 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -187,6 +187,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 			   bool *deferrable, bool *initdeferred, bool *not_valid,
 			   bool *no_inherit, core_yyscan_t yyscanner);
 static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+static VacuumOptions *makeVacOpt(VacuumOption flag, int nworkers);
 
 %}
 
@@ -237,6 +238,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	struct ImportQual	*importqual;
 	InsertStmt			*istmt;
 	VariableSetStmt		*vsetstmt;
+	VacuumOptions		*vacopts;
 	PartitionElem		*partelem;
 	PartitionSpec		*partspec;
 	PartitionBoundSpec	*partboundspec;
@@ -305,7 +307,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_extension_opt_item alter_extension_opt_item
 
 %type <ival>	opt_lock lock_type cast_context
-%type <ival>	vacuum_option_list vacuum_option_elem
+%type <vacopts>	vacuum_option_list vacuum_option_elem
 %type <boolean>	opt_or_replace
 				opt_grant_grant_option opt_grant_admin_option
 				opt_nowait opt_if_exists opt_with_data
@@ -10152,47 +10154,59 @@ cluster_index_specification:
 VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose qualified_name
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = $5;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose AnalyzeStmt
 				{
 					VacuumStmt *n = (VacuumStmt *) $5;
-					n->options |= VACOPT_VACUUM;
+					n->options.flags |= VACOPT_VACUUM;
 					if ($2)
-						n->options |= VACOPT_FULL;
+						n->options.flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						n->options.flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						n->options.flags |= VACOPT_VERBOSE;
+					n->options.nworkers = 0;
 					$$ = (Node *)n;
 				}
 			| VACUUM '(' vacuum_option_list ')'
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *) n;
@@ -10200,29 +10214,52 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 			| VACUUM '(' vacuum_option_list ')' qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = $5;
 					n->va_cols = $6;
 					if (n->va_cols != NIL)	/* implies analyze */
-						n->options |= VACOPT_ANALYZE;
+						n->options.flags |= VACOPT_ANALYZE;
 					$$ = (Node *) n;
 				}
 		;
 
 vacuum_option_list:
 			vacuum_option_elem								{ $$ = $1; }
-			| vacuum_option_list ',' vacuum_option_elem		{ $$ = $1 | $3; }
+			| vacuum_option_list ',' vacuum_option_elem
+			{
+				VacuumOptions *vacopts1 = (VacuumOptions *)$1;
+				VacuumOptions *vacopts2 = (VacuumOptions *)$3;
+
+				vacopts1->flags |= vacopts2->flags;
+				if (vacopts2->flags == VACOPT_PARALLEL)
+					vacopts1->nworkers = vacopts2->nworkers;
+
+				$$ = vacopts1;
+				pfree(vacopts2);
+			}
 		;
 
 vacuum_option_elem:
-			analyze_keyword		{ $$ = VACOPT_ANALYZE; }
-			| VERBOSE			{ $$ = VACOPT_VERBOSE; }
-			| FREEZE			{ $$ = VACOPT_FREEZE; }
-			| FULL				{ $$ = VACOPT_FULL; }
+			analyze_keyword		{ $$ = makeVacOpt(VACOPT_ANALYZE, 0); }
+			| VERBOSE			{ $$ = makeVacOpt(VACOPT_VERBOSE, 0); }
+			| FREEZE			{ $$ = makeVacOpt(VACOPT_FREEZE, 0); }
+			| FULL				{ $$ = makeVacOpt(VACOPT_FULL, 0); }
+			| PARALLEL ICONST
+				{
+					if ($2 < 1)
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+								 errmsg("parallel vacuum degree must be more than 1"),
+								 parser_errposition(@1)));
+					$$ = makeVacOpt(VACOPT_PARALLEL, $2);
+				}
 			| IDENT
 				{
 					if (strcmp($1, "disable_page_skipping") == 0)
-						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+						$$ = makeVacOpt(VACOPT_DISABLE_PAGE_SKIPPING, 1);
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
@@ -10230,27 +10267,36 @@ vacuum_option_elem:
 									 parser_errposition(@1)));
 				}
 		;
-
 AnalyzeStmt:
 			analyze_keyword opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| analyze_keyword opt_verbose qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = $3;
 					n->va_cols = $4;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 		;
 
@@ -15934,6 +15980,16 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
 	return (Node *) s;
 }
 
+static VacuumOptions *
+makeVacOpt(VacuumOption flag, int nworkers)
+{
+	VacuumOptions *vacopt = palloc(sizeof(VacuumOptions));
+
+	vacopt->flags = flag;
+	vacopt->nworkers = nworkers;
+	return vacopt;
+}
+
 /* parser_init()
  * Initialize to parse one query string
  */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 00b1e82..cf4d61f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -187,7 +187,7 @@ typedef struct av_relation
 typedef struct autovac_table
 {
 	Oid			at_relid;
-	int			at_vacoptions;	/* bitmask of VacuumOption */
+	VacuumOptions at_vacoptions;	/* contains bitmask of VacuumOption */
 	VacuumParams at_params;
 	int			at_vacuum_cost_delay;
 	int			at_vacuum_cost_limit;
@@ -2483,7 +2483,7 @@ do_autovacuum(void)
 			 * next table in our list.
 			 */
 			HOLD_INTERRUPTS();
-			if (tab->at_vacoptions & VACOPT_VACUUM)
+			if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 				errcontext("automatic vacuum of table \"%s.%s.%s\"",
 						   tab->at_datname, tab->at_nspname, tab->at_relname);
 			else
@@ -2899,10 +2899,11 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab = palloc(sizeof(autovac_table));
 		tab->at_relid = relid;
 		tab->at_sharedrel = classForm->relisshared;
-		tab->at_vacoptions = VACOPT_SKIPTOAST |
+		tab->at_vacoptions.flags = VACOPT_SKIPTOAST |
 			(dovacuum ? VACOPT_VACUUM : 0) |
 			(doanalyze ? VACOPT_ANALYZE : 0) |
 			(!wraparound ? VACOPT_NOWAIT : 0);
+		tab->at_vacoptions.nworkers = 1;
 		tab->at_params.freeze_min_age = freeze_min_age;
 		tab->at_params.freeze_table_age = freeze_table_age;
 		tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
@@ -3149,10 +3150,10 @@ autovac_report_activity(autovac_table *tab)
 	int			len;
 
 	/* Report the command and possible options */
-	if (tab->at_vacoptions & VACOPT_VACUUM)
+	if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: VACUUM%s",
-				 tab->at_vacoptions & VACOPT_ANALYZE ? " ANALYZE" : "");
+				 tab->at_vacoptions.flags & VACOPT_ANALYZE ? " ANALYZE" : "");
 	else
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: ANALYZE");
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index a0b0eec..09eac4e 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3597,6 +3597,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
 			event_name = "ParallelBitmapScan";
 			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_PREPARE:
+			event_name = "ParallelVacuumPrepare";
+			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_DONE:
+			event_name = "ParallelVacuumDone";
+			break;
 		case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
 			event_name = "ProcArrayGroupUpdate";
 			break;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index ddacac8..57ec3f7 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -669,7 +669,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				VacuumStmt *stmt = (VacuumStmt *) parsetree;
 
 				/* we choose to allow this during "read only" transactions */
-				PreventCommandDuringRecovery((stmt->options & VACOPT_VACUUM) ?
+				PreventCommandDuringRecovery((stmt->options.flags & VACOPT_VACUUM) ?
 											 "VACUUM" : "ANALYZE");
 				/* forbidden in parallel mode due to CommandIsReadOnly */
 				ExecVacuum(stmt, isTopLevel);
@@ -2498,7 +2498,7 @@ CreateCommandTag(Node *parsetree)
 			break;
 
 		case T_VacuumStmt:
-			if (((VacuumStmt *) parsetree)->options & VACOPT_VACUUM)
+			if (((VacuumStmt *) parsetree)->options.flags & VACOPT_VACUUM)
 				tag = "VACUUM";
 			else
 				tag = "ANALYZE";
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 08a08c8..3c2d5df 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2040,7 +2040,6 @@ EstimateSnapshotSpace(Snapshot snap)
 	Size		size;
 
 	Assert(snap != InvalidSnapshot);
-	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
 	size = add_size(sizeof(SerializedSnapshotData),
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b2132e7..248a670 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,7 @@ extern Size heap_parallelscan_estimate(Snapshot snapshot);
 extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
 							 Relation relation, Snapshot snapshot);
 extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index a903511..1fc10bf 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "access/heapam.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -157,7 +159,7 @@ extern int	vacuum_multixact_freeze_table_age;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
-extern void vacuum(int options, RangeVar *relation, Oid relid,
+extern void vacuum(VacuumOptions options, RangeVar *relation, Oid relid,
 	   VacuumParams *params, List *va_cols,
 	   BufferAccessStrategy bstrategy, bool isTopLevel);
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
@@ -188,8 +190,9 @@ extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
 
 /* in commands/vacuumlazy.c */
-extern void lazy_vacuum_rel(Relation onerel, int options,
+extern void lazy_vacuum_rel(Relation onerel, VacuumOptions options,
 				VacuumParams *params, BufferAccessStrategy bstrategy);
+extern void LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc);
 
 /* in commands/analyze.c */
 extern void analyze_rel(Oid relid, RangeVar *relation, int options,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 5f2a4a7..d81809b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3087,13 +3087,20 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_NOWAIT = 1 << 5,		/* don't wait to get lock (autovacuum only) */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_PARALLEL = 1 << 8	/* do VACUUM parallelly */
 } VacuumOption;
 
+typedef struct VacuumOptions
+{
+	VacuumOption flags; /* OR of VacuumOption flags */
+	int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
+
 typedef struct VacuumStmt
 {
 	NodeTag		type;
-	int			options;		/* OR of VacuumOption flags */
+	VacuumOptions	options;
 	RangeVar   *relation;		/* single table to process, or NULL */
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumStmt;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 6bffe63..f3c7a14 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -808,6 +808,8 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_PARALLEL_BITMAP_SCAN,
+	WAIT_EVENT_PARALLEL_VACUUM_PREPARE,
+	WAIT_EVENT_PARALLEL_VACUUM_DONE,
 	WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
 	WAIT_EVENT_SAFE_SNAPSHOT,
 	WAIT_EVENT_SYNC_REP,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 6f68663..8887f4d 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -80,6 +80,7 @@ CONTEXT:  SQL function "do_analyze" statement 1
 SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 -- partitioned table
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 7c5fb04..cbd8c44 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -61,6 +61,7 @@ VACUUM FULL vaccluster;
 VACUUM FULL vactst;
 
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 
 DROP TABLE vaccluster;
 DROP TABLE vactst;

#40

Masahiko Sawada

sawada.mshk@gmail.com

over 8 years ago

In reply to: Masahiko Sawada (#39)

3 attachment(s)

Re: Block level parallel vacuum WIP

On Wed, Jul 26, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote:

On 3/4/17 9:08 PM, Masahiko Sawada wrote:

On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.

I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.

Thank you!

I re-considered the basic design of parallel lazy vacuum. I didn't
change the basic concept of this feature and usage, the lazy vacuum
still executes with some parallel workers. In current design, dead
tuple TIDs are shared with all vacuum workers including leader process
when table has index. If we share dead tuple TIDs, we have to make two
synchronization points: before starting vacuum and before clearing
dead tuple TIDs. Before starting vacuum we have to make sure that the
dead tuple TIDs are not added no more. And before clearing dead tuple
TIDs we have to make sure that it's used no more.

For index vacuum, each indexes is assigned to a vacuum workers based
on ParallelWorkerNumber. For example, if a table has 5 indexes and
vacuum with 2 workers, the leader process and one vacuum worker are
assigned to 2 indexes, and another vacuum process is assigned the
remaining one. The following steps are how the parallel vacuum
processes if table has indexes.

1. The leader process and workers scan the table in parallel using
ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory.
2. Before vacuum on table, the leader process sort the dead tuple TIDs
in physical order once all workers completes to scan the table.
3. In vacuum on table, the leader process and workers reclaim garbage
on table in block-level parallel.
4. In vacuum on indexes, the indexes on table is assigned to
particular parallel worker or leader process. The process assigned to
a index vacuums on the index.
5. Before back to scanning the table, the leader process clears the
dead tuple TIDs once all workers completes to vacuum on table and
indexes.

Attached the latest patch but it's still PoC version patch and
contains some debug codes. Note that this patch still requires another
patch which moves the relation extension lock out of heavy-weight
lock[1]. The parallel lazy vacuum patch could work even without [1]
patch but could fail during vacuum in some cases.

Also, I attached the result of performance evaluation. The table size
is approximately 300MB ( > shared_buffers) and I deleted tuples on
every blocks before execute vacuum so that vacuum visits every blocks.
The server spec is
* Intel Xeon E5620 @ 2.4Ghz (8cores)
* 32GB RAM
* ioDrive

According to the result of table with indexes, performance of lazy
vacuum improved up to a point where the number of indexes and parallel
degree are the same. If a table has 16 indexes and vacuum with 16
workers, parallel vacuum is 10x faster than single process execution.
Also according to the result of table with no indexes, the parallel
vacuum is 5x faster than single process execution at 8 parallel
degree. Of course we can vacuum only for indexes

I'm planning to work on that in PG11, will register it to next CF.
Comment and feedback are very welcome.

Since the previous patch conflicts with current HEAD I attached the
latest version patch. Also, I measured performance benefit with more
large 4GB table and indexes and attached the result.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

result_all.pngimage/png; name=result_all.pngDownload

�PNG


IHDR��,�APLTE���������������@��  ��� �@����@����`��`��`��@��0`��`@@@@���`�``�`���`���@��`��`�`�����` �```    @@ @�`� `�``����@ � ����������  ���`����`�����`�@�@@�����`���������������������``������������� ����  � �� �  �@ �@��`��`�������@��@��`��p��������������������___???ccc�/�a IDATx�����&Eu���z�~J��DP�#X O�7I�Z
P<��WUAAAAAA������_�t���RS���������%S;T���8��@P:����������J�A�5�U����D�K������[@�>�/��z��z� �K�8x����k�p%��0�0B��[CND)$����M�����e��L_>;��%C6����p{&1�l����e/������K���o�x��"��[����Y��!�X��I$�l��gB��=
]�%
@}�,��]���h�PV=��������y������<���P�����q���&�	���6�d--�����z���e�G��s��%�YH;8�M��W�������	�%v
�fY=��Zq���A��t�i���]�f~B�{P/����Y>�h��B��m�*�*���3���-+��+�`�
k����hD�W��{H�����)�����f�/�	����2X\b���yT(������h���	����Z9[&���5��y�����+��>��
BYE@������L���(tE@�@(���]�ZZt���G������I�}`��+d���' t���2%V�@v��|	��lE����]�ZZt�l"z��+��*/�(h!��m��XT?]����a��e�kE~��+-�����~/����@2:n�V��j�|R;v6�!�� 	�<�"?�
(����\�~B`^�X]���n�*���|�`a5H������K�����)!�����{j�6)g���U:Y��G��-X��(v���C�\,���f�e����?�XC����8����b���T?!���y
B�- YS�nbI;R2��nj=���.W�'��|�<4��	��*�����8���?�^�<���^�/
������&b6�{~��t�����������@(��I�y�����_l�5����PX��%3��|�|q�g��8:S��	,�!�`�L�����|��?h��?�� YK�nb��M=�/��Y��_���}���G@�do3����ni6�������]���+�,�������.�W��������J]Q�h_�/��E%��e��5�ks��X�/�G��K�v�l0q]�����9����y@�8
]�*BD��
BYE@����`��&�BWt��,Wb[��+�`�jj�Y��d�	$ki�M,�l+f�"|��&�{l�.�kS�+�	%U�v�7�Y�_��:>X����x�)~�����)X����x�)~���H����X�
8�+#��V���X���%`�l���)~��������/C�=!���m��"?#���,�{�	������"?�(X�	���:���O�,x��^�"?��>}T������H����V�T��Y�P�c�^.�0
]���w����O��"���Y|^n�`�J�K>�����"��\p��2��`�OH��!��~Bzg��v- YS�nbI;RBg�b5����Y�������a��K?!
_>&'�]~Bf�+%��3=i���!w��'�^(�A�����6`�(�6�\�O��do3��B!��8S�5��f�]���������[��8S�5�rJ�g�|M�?_i'B@����l��s�k�����tZ�Hk� �T��x�dM�-�j�(.��|e]�ZZt�j������]�"��" ,Ot$ki�M6����DQV��{��=�����?���[���{���$���`j��{��=f@(��z+�s���LEgK��#E�&��^>gC@�@���nb!.;��C���1y�]�4Z��,;��C���U~m ,MK�n����������i^��y/=�6��o�ZZt����{�C�����
H�6�M�@�u����)�6��C)$��i�z;
�%�\ �0%Pz���Wx��X��(=�T|�.�4{��H�����c1IO5��
��T/�BW	|��#��+�s�ht��+� D�x�wn�WJ��NG!�p%Px�p%���E71�Y�������+�S(������Q���[xy�}����Q�,�~��^����"?���.S����N��17�>�Xo�3Z@����^/>5Qm�TrI��+�SLDO5L�:m�T/W/�e1L%��G[��y'��z���N�vO5�\eU35p�[#���X�kE~���q>�#�$Q�UX�q��0�6�����/��������������7i'�R'&���-�'+��B&����^�E*��V�?@�����m�yz�@\[�
��V�g�	^E�?��S�<�x)���������� �(����E71��hb�JF�]p|��M����`I�`��y��`��&��m����?����[�`�!���D��U������?����E��D@]0(X����j�01��������]���[�`*	@�&Dn��pKWt���	��M�%F`��y��`V���	��M����	Y�Yr����(��)� D��F�gh��^�Rd�n:�4���@�	�YsW�&�K"6o�x��������bh~$�����7A<3AYt��t�M8]nB�5*@�&Dn��rKWt���uN������
� $Z�:����4t�qE�6��G����r<d�F�slzM�L�M`���`!
phW'�zp�����W�&V�
�q���<�?�N��S�.8����p������� �DOp�l���e��`�
���������j�0�2l�	@���K��7��Y5��:?�}pL��M�6|�A�z<�@�=	������A\��2t���{��v��a��BW�v�\��23
]���������(����4LDY'��bfT���~Q�g��������A����������@h�5�t�QE�6Q�0c�S���5�&X�N���BX��
-
]����K����4��6�M��&� ���_/{0��X�[�B_�t�uB�)� 0������>tm��t�/�^@4�%+����t}��[��T�O��=��'���Jc"0��E7�"l�~X� �y���q��	q��0b�:� �k:�$�����k���eV��#�nmJ�E��7�y�rx�
P��`v�p |������mi�M��@a]p�2�	Q>~�'�������O�� �9��>8��6��t.DP�a6�A�~B���(���{����O����'M`�:�5�������R�=��O�������Q�C�X��79�&��D0}�:�$��-X��E�6h`z=	�w����i�m�v |�Nt��r�>BD�f�J�k�G�yTbe���a��o$pS���{��G%V����C�Ztk��9����I���M`���5�	�]�������<*�2l�[s��Q�����F :�OD�6��`�ZO0���hK�A�	@�����Q�C�X6`��}���t+�4��N*������>���z�1���+�k����4U�_[�=�A��+���Z��{D��Q�*��O�G�W�N�����p���0��`\����"E@s���oi�M���
B<��}��&V��b��>8����P�&� �@81<������C[�o�[��TgZ@�q}����]�}�8��������s��6%p����'d���@1
]%{,<��w���t�'D�6�[���}�O���&��w�L%��u]�&
��}��	?H��nb���@wm2H����)���Wu�L�q�+������k�	����O�,���B|��Y��D�9O<
�Z�#?!�@`" s��7�#?!���OP�Q��(��'D�
�Gm�H��cb�����O���H+��>�B7�)�����@/@�����]�k�G{��6�`�%����!'�$$wb���O��X�p,����<���%���_W���6�M����	�����?�W��]�[��S��@L��Pt<U5���m�����B��;c!�,3�^���A�����M�� ���x�j���9l@����$�$���}p�nx/�������`P� ������k
9��_H��`F�?
���0Yf ��c����Utk���BE��a~B��W:����������j���m��L��8N]W��	Q��vg~�qr3���8���wU�>�0���r���e6���E��1�SX$�-����4(�I ��V�]�Y�;���X5�6�`�J���	Q>^1}np���'$�GR?!}��BU����;a��yBg�lz��C��(�;���`��ige"��O���M�� Q%|���	Q���a��2�{��A��_L���o��`nf ��_��t*lPB�6��LI`U�AC����x*�A�"`Jp`c�n����_������� �Dtd��^����nm�
��>�Ax6�+	|6�DE��������A��l�K�����6��KD@�k{�YO���������=�������y
!A�P�4��"�ZZs7�/�[�/[�=�W�O�=�0���S�n���}1l����K�����M��$*���������1����
K����������<p r�m������	�I�{�Y���a �� 4*@�:�z���\����8^�Q�7��
�cm�t4+����D@����&0���Gd�d�������-@rV��������m@��>��)1���O��S
��L�������[��r�l���z������"@7	��v��������T��4W��7������#����I�g�Cp/*�	`�v@�6�s�HAbM!9�p8mNG!K��k�4���?V
a&S����J4����&
`2���6w�8�������)�6>���K��NRG5K���!�gV�"������B�}C�.x���z���
3����I��IH�qPc�Tc�\�� ����C�G'���Gq6O5k�{��f#�t�!A��(��0��zb��r�6��RX�k@�*�Y}��I�m�
������"�����������7F���1x?y��B�,���0���}��8=2��d����1���c
�g�)�[������	]�U������Mr@���W#��W���
�@��]Q0����cpUH�L�6��,�B���z����0}�t7�t
s�M)�� ���l��nV��	a��NX��:d�tkz��d?]���C ��(�C�L�|Us��n��8�$�������&	 ]
�H@���9�F
s�&�R!�z�U����E�@�gP��P�L���a�,-	a�F�D�hP�����z��B@&�>@�����
`v��Cv�����l���u/�����K�>���9�;��o
oD�Y���e�@�dp�	a��4"�����x-�&k��w��[�l�w����Mo���������F1�\'P:Gl:�F�M�rb+���z	�>��iv�tp����C/!hj	J�}�z�c�*��G�C���|����[ h@� ���hp��W\��� �7��4kOL���x���,�/SR�z��4���<(q����/��E�N�8x�<��?!���#
9;C��G��E�	���?��B���\���>��C�}C:�����&D��������C�`�p���7!����+��Uz �z~21s��/��Q��M�6����@�uH��T���bx.tElA������{~l�	s�XmRD@���VER8��E������p�M�.����M'�����B�a�U���3F
�2��I}	x�V��T����XY�]������������:_�,Z��h���-�����_������Z,rN\C���e�v�.��6����?�����0Y���^0I�C�.�"�>�W��:����6!�S�D���f���^g���f1M�dgkVP&�����+��soX�`�������&�OS@�G��b8��-b����`�#�P=U������6`%A{=G��E5�����m�+Ef"Zk��y�0�m��x�P�l6�����^`�����(�<�S���(k�H��������/������8���4��{��_�)���M�V��s���_�#������
K�F�6�%���L�
��3�X��0@�����_\�����o6=�G�r��i���GH�3_t���hm��j~�~3������q1y��^��I�2���x��Ga��l?
����wR~`^�����2�W��W�Jk��~����3����"]t1?|9E'�����{�e����K��r�.��)��;�u�����������U�*�s�&��
�]����9�l]���Q�(X���S��rR��8/&��|����Xa����\��.�4
M����� �EOt�r,�4?!����r�f��@�4?!����r~m�YE�6]�k�����C�t��hN�QN��93k�5Q�^L|9��l�W�&�����b����A�:/��!�/&�>�+�S��� �Lgd�����,��:Sa��d��_t1W���O�E��z �&?!t�~B � � � � z�k���e��E�A��I����a����n�[c��	f?��(��4�)�P+���&�z���M����d�D�^���B���A����2��������~�];��XCx4#������XcD���
��cg�h
�)�P+���e{�2@+5�m��6�m�@�he C�B�!�B��`� ��5�R��1��4�%�Xikm�hF��y�v�R��k}��E���W����`E)������D�v�2�V��L��� B�B��`� ��cz+�B����GS�@~fV�����W����������w%dW"D;cE+�M6�2!z!�C�8����H�����s
����km��p:���t�x��O�Ms�����/-��mi��2��je`P91������Y/��)���$��*��9��>MF��%crM=�������@s1Y�um2�:n�T>nAhe B�29k� ���`Ma_�7�a�����qC�X'l,�MWi7���k����|w-
�����,��m?��]!��sd�e�p��8%�d
Z#T/aQ����!'���`�M^�6
O���
MQl����������Y��O��}!����$��^�8�Ns5(�e�m����0_C�.�i�N��_g�������T����&���2m��BXe Fz�.���5C/���|_�5��8���n������h���tc�'e�����ft������2a+�v���A�sz�(z,��� NL/9���@m��'�/y
_���k���]7�y�q�-������B���wo1���Y���2i:2���?�]��
�B!Z��BP����&����V��ND���e����6!j!�k����u����\��g;��LQ��.i-#j�s5���(��?�H��A�he �a���6il�@��a������d�Vk���k��AAAAAAAAAA\�+�w��jP�[<b;`I&���=V���:�"v��]?:����$np~i���%9��{��pk��Z�Z|���=��ms�k������<�~��������M85����_3����dm���&��9
yX~����FU���=�����_�Z@5p��lIC9l]*\�N���j"{����IDAT���/��q�6e&e�������r�m��`��m6�g�������aM]�.P����
.e�(�z����Fn�@pl�@-
y��/T�v�G�\C}��d�b���0�Z7���i�;�Wx���(����6ND�@e�����i�dn�\�X�7r�_rp���L��(K �I�}K;R^�@PRq�0��2�O_g�4��m��AAA�����T���IEND�B`�

parallel_vacuum_v4.patchapplication/octet-stream; name=parallel_vacuum_v4.patchDownload

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 421c18d..b93231f 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
+VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | PARALLEL <replaceable class="PARAMETER">N</replaceable> | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable class="PARAMETER">table_name</replaceable> ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 </synopsis>
@@ -130,6 +130,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">
    </varlistentry>
 
    <varlistentry>
+    <term><literal>PARALLEL <replaceable class="PARAMETER">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable class="PARAMETER">N
+      </replaceable> background workers. Collecting garbage on table is processed
+      in block-level parallel. For tables with indexes, parallel vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a index are processed
+      by particular parallel vacuum worker. This option can not use with <literal>FULL</>
+      option.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>DISABLE_PAGE_SKIPPING</literal></term>
     <listitem>
      <para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index e283fe5..be3aa80 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -89,7 +89,6 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
 						bool is_bitmapscan,
 						bool is_samplescan,
 						bool temp_snap);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1668,7 +1667,7 @@ heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
  *		first backend gets an InvalidBlockNumber return.
  * ----------------
  */
-static BlockNumber
+BlockNumber
 heap_parallelscan_nextpage(HeapScanDesc scan)
 {
 	BlockNumber page = InvalidBlockNumber;
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index 17b1038..700286c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -19,6 +19,7 @@
 #include "access/xlog.h"
 #include "catalog/namespace.h"
 #include "commands/async.h"
+#include "commands/vacuum.h"
 #include "executor/execParallel.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -118,6 +119,9 @@ static const struct
 {
 	{
 		"ParallelQueryMain", ParallelQueryMain
+	},
+	{
+		"LazyVacuumWorkerMain", LazyVacuumWorkerMain
 	}
 };
 
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa1812..505f0fe 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -72,7 +72,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
 				  MultiXactId lastSaneMinMulti);
-static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
+static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options,
 		   VacuumParams *params);
 
 /*
@@ -87,17 +87,17 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	VacuumParams params;
 
 	/* sanity checks on options */
-	Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
-	Assert((vacstmt->options & VACOPT_VACUUM) ||
-		   !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
-	Assert((vacstmt->options & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
-	Assert(!(vacstmt->options & VACOPT_SKIPTOAST));
+	Assert(vacstmt->options.flags & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert((vacstmt->options.flags & VACOPT_VACUUM) ||
+		   !(vacstmt->options.flags & (VACOPT_FULL | VACOPT_FREEZE)));
+	Assert((vacstmt->options.flags & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
+	Assert(!(vacstmt->options.flags & VACOPT_SKIPTOAST));
 
 	/*
 	 * All freeze ages are zero if the FREEZE option is given; otherwise pass
 	 * them as -1 which means to use the default values.
 	 */
-	if (vacstmt->options & VACOPT_FREEZE)
+	if (vacstmt->options.flags & VACOPT_FREEZE)
 	{
 		params.freeze_min_age = 0;
 		params.freeze_table_age = 0;
@@ -146,7 +146,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
  * memory context that will not disappear at transaction commit.
  */
 void
-vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
+vacuum(VacuumOptions options, RangeVar *relation, Oid relid, VacuumParams *params,
 	   List *va_cols, BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	const char *stmttype;
@@ -157,7 +157,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 
 	Assert(params != NULL);
 
-	stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
+	stmttype = (options.flags & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
 
 	/*
 	 * We cannot run VACUUM inside a user transaction block; if we were inside
@@ -167,7 +167,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 *
 	 * ANALYZE (without VACUUM) can run either way.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 	{
 		PreventTransactionChain(isTopLevel, stmttype);
 		in_outer_xact = false;
@@ -189,17 +189,26 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
-	if ((options & VACOPT_FULL) != 0 &&
-		(options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
 
 	/*
+	 * Sanity check PARALLEL option.
+	 */
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_PARALLEL) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("VACUUM option PARALLEL cannnot be used with FULL")));
+
+	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
 	 */
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 		pgstat_vacuum_stat();
 
 	/*
@@ -245,11 +254,11 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 * transaction block, and also in an autovacuum worker, use own
 	 * transactions so we can release locks sooner.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 		use_own_xacts = true;
 	else
 	{
-		Assert(options & VACOPT_ANALYZE);
+		Assert(options.flags & VACOPT_ANALYZE);
 		if (IsAutoVacuumWorkerProcess())
 			use_own_xacts = true;
 		else if (in_outer_xact)
@@ -299,13 +308,13 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		{
 			Oid			relid = lfirst_oid(cur);
 
-			if (options & VACOPT_VACUUM)
+			if (options.flags & VACOPT_VACUUM)
 			{
 				if (!vacuum_rel(relid, relation, options, params))
 					continue;
 			}
 
-			if (options & VACOPT_ANALYZE)
+			if (options.flags & VACOPT_ANALYZE)
 			{
 				/*
 				 * If using separate xacts, start one for analyze. Otherwise,
@@ -318,7 +327,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 					PushActiveSnapshot(GetTransactionSnapshot());
 				}
 
-				analyze_rel(relid, relation, options, params,
+				analyze_rel(relid, relation, options.flags, params,
 							va_cols, in_outer_xact, vac_strategy);
 
 				if (use_own_xacts)
@@ -354,7 +363,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		StartTransactionCommand();
 	}
 
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 	{
 		/*
 		 * Update pg_database.datfrozenxid, and truncate pg_xact if possible.
@@ -1221,7 +1230,7 @@ vac_truncate_clog(TransactionId frozenXID,
  *		At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options, VacuumParams *params)
 {
 	LOCKMODE	lmode;
 	Relation	onerel;
@@ -1242,7 +1251,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 */
 	PushActiveSnapshot(GetTransactionSnapshot());
 
-	if (!(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_FULL))
 	{
 		/*
 		 * In lazy vacuum, we can set the PROC_IN_VACUUM flag, which lets
@@ -1282,7 +1291,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
 	 * way, we can be sure that no other backend is vacuuming the same table.
 	 */
-	lmode = (options & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+	lmode = (options.flags & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * Open the relation and get the appropriate lock on it.
@@ -1293,7 +1302,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * If we've been asked not to wait for the relation lock, acquire it first
 	 * in non-blocking mode, before calling try_relation_open().
 	 */
-	if (!(options & VACOPT_NOWAIT))
+	if (!(options.flags & VACOPT_NOWAIT))
 		onerel = try_relation_open(relid, lmode);
 	else if (ConditionalLockRelationOid(relid, lmode))
 		onerel = try_relation_open(relid, NoLock);
@@ -1412,7 +1421,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * us to process it.  In VACUUM FULL, though, the toast table is
 	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 	 */
-	if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_SKIPTOAST) && !(options.flags & VACOPT_FULL))
 		toast_relid = onerel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -1431,7 +1440,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (options & VACOPT_FULL)
+	if (options.flags & VACOPT_FULL)
 	{
 		/* close relation before vacuuming, but hold lock until commit */
 		relation_close(onerel, NoLock);
@@ -1439,7 +1448,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, false,
-					(options & VACOPT_VERBOSE) != 0);
+					(options.flags & VACOPT_VERBOSE) != 0);
 	}
 	else
 		lazy_vacuum_rel(onerel, options, params, vac_strategy);
@@ -1493,8 +1502,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
  * hit dangling index pointers.
  */
 void
-vac_open_indexes(Relation relation, LOCKMODE lockmode,
-				 int *nindexes, Relation **Irel)
+vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
 {
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index e9b4045..0ec0cd3 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -22,6 +22,20 @@
  * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * In PostgreSQL 10, we support a parallel option for lazy vacuum. In parallel
+ * lazy vacuum, multiple vacuum worker processes get blocks in parallel using
+ * parallel heap scan and process them. If a table with indexes the parallel
+ * vacuum workers vacuum the heap and indexes in parallel.  Also, since dead
+ * tuple TIDs is shared with all vacuum processes including the leader process
+ * the parallel vacuum processes have to make two synchronization points in
+ * lazy vacuum processing: before starting vacuum and before clearing dead
+ * tuple TIDs. In those two points the leader treats dead tuple TIDs as an
+ * arbiter. The information required by parallel lazy vacuum such as the
+ * statistics of table, parallel heap scan description have to be shared with
+ * all vacuum processes, and table statistics are funneled by the leader
+ * process after finished. However, dead tuple TIDs need to be shared only
+ * when the table has indexes. For table with no indexes, each parallel worker
+ * processes blocks and vacuum them independently.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -41,8 +55,10 @@
 #include "access/heapam_xlog.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -54,6 +70,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "utils/lsyscache.h"
@@ -62,6 +79,7 @@
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
 
+//#define PLV_TIME
 
 /*
  * Space/time tradeoff parameters: do these need to be user-tunable?
@@ -103,10 +121,81 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+/* DSM key for parallel lazy vacuum */
+#define VACUUM_KEY_PARALLEL_SCAN	UINT64CONST(0xFFFFFFFFFFF00001)
+#define VACUUM_KEY_VACUUM_STATS		UINT64CONST(0xFFFFFFFFFFF00002)
+#define VACUUM_KEY_INDEX_STATS	    UINT64CONST(0xFFFFFFFFFFF00003)
+#define VACUUM_KEY_DEAD_TUPLE_CTL	UINT64CONST(0xFFFFFFFFFFF00004)
+#define VACUUM_KEY_DEAD_TUPLES		UINT64CONST(0xFFFFFFFFFFF00005)
+#define VACUUM_KEY_PARALLEL_STATE	UINT64CONST(0xFFFFFFFFFFF00006)
+
+/*
+ * see note of lazy_scan_heap_get_nextpage about forcing scanning of
+ * last page
+ */
+#define FORCE_CHECK_PAGE(blk) \
+	(blkno == (blk - 1) && should_attempt_truncation(vacrelstats))
+
+/* Check if given index is assigned to this parallel vacuum worker */
+#define IsAssignedIndex(i, pstate) \
+	(pstate == NULL || \
+	 (((i) % ((LVParallelState *) (pstate))->nworkers -1 ) == ParallelWorkerNumber))
+
+#define IsDeadTupleShared(lvstate) \
+	((LVState *)(lvstate))->parallel_mode && \
+	((LVState *)(lvstate))->vacrelstats->nindexes > 0
+
+/* Vacuum worker state for parallel lazy vacuum */
+#define VACSTATE_SCAN			0x1	/* heap scan phase */
+#define VACSTATE_VACUUM			0x2	/* vacuuming on table and index */
+
+/*
+ * Vacuum relevant options and thresholds we need share with parallel
+ * vacuum workers.
+ */
+typedef struct VacuumInfo
+{
+	int				options;	/* VACUUM options */
+	bool			aggressive;	/* does each worker need to aggressive vacuum? */
+	TransactionId	oldestxmin;
+	TransactionId	freezelimit;
+	MultiXactId		multixactcutoff;
+	int				elevel;
+} VacuumInfo;
+
+/* Struct for index statistics that are used for parallel lazy vacuum */
+typedef struct LVIndStats
+{
+	bool		updated;	/* need to be updated? */
+	BlockNumber	num_pages;
+	BlockNumber	num_tuples;
+} LVIndStats;
+
+/* Struct for parallel lazy vacuum state */
+typedef struct LVParallelState
+{
+	int nworkers;			/* # of process doing vacuum */
+	VacuumInfo	info;
+	int	state;				/* current parallel vacuum status */
+	int	finish_count;
+	ConditionVariable cv;
+	slock_t	mutex;
+} LVParallelState;
+
+/* Struct for control dead tuple TIDs array */
+typedef struct LVDeadTupleCtl
+{
+	int			dt_max;	/* # slots allocated in array */
+	int 		dt_count; /* # of dead tuple */
+
+	/* Used only for parallel lazy vacuum */
+	int			dt_index;
+	slock_t 	mutex;
+} LVDeadTupleCtl;
+
 typedef struct LVRelStats
 {
-	/* hasindex = true means two-pass strategy; false means one-pass */
-	bool		hasindex;
+	int			nindexes; /* > 0 means two-pass strategy; = 0 means one-pass */
 	/* Overall statistics about rel */
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
@@ -118,19 +207,46 @@ typedef struct LVRelStats
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
 	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
 	double		tuples_deleted;
-	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
 	int			num_index_scans;
+	BlockNumber pages_removed;
+	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+/* Struct for lazy vacuum execution */
+typedef struct LVState
+{
+	bool		parallel_mode;
+	LVRelStats *vacrelstats;
+	/*
+	 * Used when both parallel and non-parallel lazy vacuum, but in parallel
+	 * lazy vacuum and table with index, dtctl points to a dynamic shared memory
+	 * and controlled by dtctl struct.
+	 */
+	LVDeadTupleCtl	*dtctl;
+	ItemPointer	deadtuples;
+
+	/* Used only for parallel lazy vacuum */
+	ParallelContext *pcxt;
+	LVParallelState *pstate;
+	ParallelHeapScanDesc pscan;
+	LVIndStats *indstats;
+} LVState;
+
+/*
+ * Scan description data for lazy vacuum. In parallel lazy vacuum,
+ * we use only heapscan instead.
+ */
+typedef struct LVScanDescData
+{
+	BlockNumber lv_cblock;					/* current scanning block number */
+	BlockNumber lv_next_unskippable_block;	/* next block number we cannot skip */
+	BlockNumber lv_nblocks;					/* the number blocks of relation */
+	HeapScanDesc heapscan;					/* field for parallel lazy vacuum */
+} LVScanDescData;
+typedef struct LVScanDescData *LVScanDesc;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -141,32 +257,47 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
-
-/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
-			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
-			   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+/* nonf-export function prototypes */
+static void lazy_vacuum_heap(Relation onerel, LVState *lvstate);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats);
-static void lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats);
+							  LVState *lvstate);
+static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
 						 LVRelStats *vacrelstats);
-static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
-static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr);
+static void lazy_space_alloc(LVState *lvstate, BlockNumber relblocks);
+static void lazy_record_dead_tuple(LVState *state, ItemPointer itemptr);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static void do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irels,
+							  int nindexes, int options, bool aggressive);
+static void lazy_scan_heap(Relation rel, LVState *lvstate, VacuumOptions options,
+						   bool aggressive);
+
+/* function prototypes for parallel vacuum */
+static void lazy_gather_vacuum_stats(ParallelContext *pxct,
+									 LVRelStats *valrelstats);
+static void lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats);
+static void lazy_initialize_dsm(ParallelContext *pcxt, Relation onrel,
+								LVState *lvstate, int options, bool aggressive);
+static LVState *lazy_initialize_worker(shm_toc *toc);
+static LVScanDesc lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan);
+static void lv_endscan(LVScanDesc lvscan);
+static BlockNumber lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+											   LVScanDesc lvscan,
+											   bool *all_visible_according_to_vm,
+											   Buffer *vmbuffer, int options, bool aggressive);
+static void lazy_prepare_vacuum(LVState *lvstate);
+static void lazy_end_vacuum(LVState *lvstate);
+static long lazy_get_max_dead_tuples(LVRelStats *vacrelstats);
 
 
 /*
@@ -179,12 +310,11 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf,
  *		and locked the relation.
  */
 void
-lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+lazy_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
 				BufferAccessStrategy bstrategy)
 {
-	LVRelStats *vacrelstats;
-	Relation   *Irel;
-	int			nindexes;
+	LVState		*lvstate;
+	LVRelStats	*vacrelstats;
 	PGRUsage	ru0;
 	TimestampTz starttime = 0;
 	long		secs;
@@ -211,7 +341,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		starttime = GetCurrentTimestamp();
 	}
 
-	if (options & VACOPT_VERBOSE)
+	if (options.flags & VACOPT_VERBOSE)
 		elevel = INFO;
 	else
 		elevel = DEBUG2;
@@ -239,10 +369,12 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 											   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 											  mxactFullScanLimit);
-	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+	if (options.flags & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
+	lvstate = (LVState *) palloc0(sizeof(LVState));
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
+	lvstate->vacrelstats = vacrelstats;
 
 	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
 	vacrelstats->old_rel_tuples = onerel->rd_rel->reltuples;
@@ -250,15 +382,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
-	/* Open all indexes of the relation */
-	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
-	vacrelstats->hasindex = (nindexes > 0);
-
 	/* Do the vacuuming */
-	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
-
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
+	lazy_scan_heap(onerel, lvstate, options, aggressive);
 
 	/*
 	 * Compute whether we actually scanned the all unfrozen pages. If we did,
@@ -267,7 +392,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	 * NB: We need to check this before truncating the relation, because that
 	 * will change ->rel_pages.
 	 */
-	if ((vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
+	if ((lvstate->vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
 		< vacrelstats->rel_pages)
 	{
 		Assert(!aggressive);
@@ -329,7 +454,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						new_rel_pages,
 						new_rel_tuples,
 						new_rel_allvisible,
-						vacrelstats->hasindex,
+						(vacrelstats->nindexes != 0),
 						new_frozen_xid,
 						new_min_multi,
 						false);
@@ -439,28 +564,166 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
- *	lazy_scan_heap() -- scan an open heap relation
+ * If the number of workers is specified more than 0, we enter the parallel lazy
+ * vacuum mode. In parallel lazy vacuum mode, we initialize a dynamic shared memory
+ * and launch parallel vacuum workers. The launcher process also vacuums the table
+ * after launched and then waits for the all vacuum workers to finish. After all vacuum
+ * workers finished we gather the vacuum statistics of table and indexes, and update
+ * them.
+ */
+static void
+lazy_scan_heap(Relation onerel, LVState *lvstate, VacuumOptions options,
+			   bool aggressive)
+{
+	ParallelContext	*pcxt;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+	Relation	*Irel;
+	int			nindexes;
+
+	lvstate->parallel_mode = options.nworkers > 0;
+
+	/* Open indexes */
+	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
+	vacrelstats->nindexes = nindexes;
+
+	if (lvstate->parallel_mode)
+	{
+		EnterParallelMode();
+
+		/* Create parallel context and initialize it */
+		pcxt = CreateParallelContext("postgres", "LazyVacuumWorkerMain",
+									 options.nworkers);
+		lvstate->pcxt = pcxt;
+
+		/* Estimate DSM size for parallel vacuum */
+		lazy_estimate_dsm(pcxt, lvstate->vacrelstats);
+
+		/* Initialize DSM for parallel vacuum */
+		InitializeParallelDSM(pcxt);
+		lazy_initialize_dsm(pcxt, onerel, lvstate, options.flags, aggressive);
+
+		/* Launch workers */
+		LaunchParallelWorkers(pcxt);
+	}
+
+	do_lazy_scan_heap(lvstate, onerel, Irel, nindexes, options.flags, aggressive);
+
+	/*
+	 * We can update relation statistics such as scanned page after gathered
+	 * statistics from all workers. Also, in parallel mode since we cannot update
+	 * index statistics at the same time the leader process have to do it.
+	 *
+	 * XXX : If we allows workers to update statistics tuples at the same time
+	 * the updating index statistics can be done in lazy_cleanup_index().
+	 */
+	if (lvstate->parallel_mode)
+	{
+		int i;
+		LVIndStats *indstats = palloc(sizeof(LVIndStats) * lvstate->vacrelstats->nindexes);
+
+		/* Wait for workers finished vacuum */
+		WaitForParallelWorkersToFinish(pcxt);
+
+		/* Gather the result of vacuum statistics from all workers */
+		lazy_gather_vacuum_stats(pcxt, vacrelstats);
+
+		/* Now we can compute the new value for pg_class.reltuples */
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 vacrelstats->rel_pages,
+															 vacrelstats->scanned_pages,
+															 vacrelstats->scanned_tuples);
+
+		/* Copy new index stats to local memory */
+		memcpy(indstats, lvstate->indstats, sizeof(LVIndStats) * vacrelstats->nindexes);
+
+		DestroyParallelContext(pcxt);
+		ExitParallelMode();
+
+		/* After exit parallel mode, update index statistics */
+		for (i = 0; i < vacrelstats->nindexes; i++)
+		{
+			Relation	ind = Irel[i];
+			LVIndStats *indstat = (LVIndStats *) &(indstats[i]);
+
+			if (indstat->updated)
+			   vac_update_relstats(ind,
+								   indstat->num_pages,
+								   indstat->num_tuples,
+								   0,
+								   false,
+								   InvalidTransactionId,
+								   InvalidMultiXactId,
+								   false);
+		}
+	}
+
+	vac_close_indexes(nindexes, Irel, RowExclusiveLock);
+}
+
+/*
+ * Entry point of parallel vacuum worker.
+ */
+void
+LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc)
+{
+	LVState		*lvstate;
+	Relation rel;
+	Relation *indrel;
+	int nindexes_worker;
+
+	/* Look up dynamic shared memory and initialize */
+	lvstate = lazy_initialize_worker(toc);
+
+	Assert(lvstate != NULL);
+
+	rel = relation_open(lvstate->pscan->phs_relid, ShareUpdateExclusiveLock);
+
+	/* Open all indexes */
+	vac_open_indexes(rel, RowExclusiveLock, &nindexes_worker,
+					 &indrel);
+
+	/* Do lazy vacuum */
+	do_lazy_scan_heap(lvstate, rel, indrel, lvstate->vacrelstats->nindexes,
+					  lvstate->pstate->info.options, lvstate->pstate->info.aggressive);
+
+	vac_close_indexes(lvstate->vacrelstats->nindexes, indrel, RowExclusiveLock);
+	heap_close(rel, ShareUpdateExclusiveLock);
+}
+
+/*
+ *	do_lazy_scan_heap() -- scan an open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
  *		things truncate dead tuples to dead line pointers, defragment the
- *		page, and set commit status bits (see heap_page_prune).  It also builds
+ *		page, and set commit status bits (see heap_page_prune).  It also uses
  *		lists of dead tuples and pages with free space, calculates statistics
  *		on the number of live tuples in the heap, and marks pages as
  *		all-visible if appropriate.  When done, or when we run low on space for
- *		dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
- *		to reclaim dead line pointers.
+ *		dead-tuple TIDs, invoke vacuuming of assigned indexes and call lazy_vacuum_heap
+ *		to reclaim dead line pointers. In parallel vacuum, we need to synchronize
+ *		at where scanning heap finished and vacuuming heap finished. The vacuum
+ *		worker reached to that point first need to wait for other vacuum workers
+ *		reached to the same point.
+ *
+ *		In parallel lazy scan, we get next page number using parallel heap scan.
+ *		Since the dead tuple TIDs are shared with all vacuum workers, we have to
+ *		wait for all other workers to reach to the same points where before starting
+ *		reclaiming dead tuple TIDs and before clearing dead tuple TIDs information
+ *		in dynamic shared memory.
  *
  *		If there are no indexes then we can reclaim line pointers on the fly;
  *		dead line pointers need only be retained until all index pointers that
  *		reference them have been killed.
  */
 static void
-lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive)
+do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irel,
+				  int nindexes, int options, bool aggressive)
 {
-	BlockNumber nblocks,
-				blkno;
+	LVRelStats *vacrelstats = lvstate->vacrelstats;
+	BlockNumber blkno;
+	BlockNumber nblocks;
 	HeapTupleData tuple;
+	LVScanDesc lvscan;
 	char	   *relname;
 	BlockNumber empty_pages,
 				vacuumed_pages;
@@ -471,11 +734,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	IndexBulkDeleteResult **indstats;
 	int			i;
 	PGRUsage	ru0;
+#ifdef PLV_TIME
+	PGRUsage	ru_scan;
+	PGRUsage	ru_vacuum;
+#endif
 	Buffer		vmbuffer = InvalidBuffer;
-	BlockNumber next_unskippable_block;
-	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
+	bool		all_visible_according_to_vm = false;
+
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -504,89 +771,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
-	lazy_space_alloc(vacrelstats, nblocks);
+	lazy_space_alloc(lvstate, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
+	/* Begin heap scan for vacuum */
+	lvscan = lv_beginscan(onerel, lvstate->pscan);
+
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = lvstate->dtctl->dt_max;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/*
-	 * Except when aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-	 * sequentially, the OS should be doing readahead for us, so there's no
-	 * gain in skipping a page now and then; that's likely to disable
-	 * readahead and so be counterproductive. Also, skipping even a single
-	 * page means that we can't update relfrozenxid, so we only want to do it
-	 * if we can skip a goodly number of pages.
-	 *
-	 * When aggressive is set, we can't skip pages just because they are
-	 * all-visible, but we can still skip pages that are all-frozen, since
-	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
-	 *
-	 * Before entering the main loop, establish the invariant that
-	 * next_unskippable_block is the next block number >= blkno that we can't
-	 * skip based on the visibility map, either all-visible for a regular scan
-	 * or all-frozen for an aggressive scan.  We set it to nblocks if there's
-	 * no such block.  We also set up the skipping_blocks flag correctly at
-	 * this stage.
-	 *
-	 * Note: The value returned by visibilitymap_get_status could be slightly
-	 * out-of-date, since we make this test before reading the corresponding
-	 * heap page or locking the buffer.  This is OK.  If we mistakenly think
-	 * that the page is all-visible or all-frozen when in fact the flag's just
-	 * been cleared, we might fail to vacuum the page.  It's easy to see that
-	 * skipping a page when aggressive is not set is not a very big deal; we
-	 * might leave some dead tuples lying around, but the next vacuum will
-	 * find them.  But even when aggressive *is* set, it's still OK if we miss
-	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
-	 * just added to that page are necessarily newer than the GlobalXmin we
-	 * computed, so they'll have no effect on the value to which we can safely
-	 * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
-	 *
-	 * We will scan the table's last page, at least to the extent of
-	 * determining whether it has tuples or not, even if it should be skipped
-	 * according to the above rules; except when we've already determined that
-	 * it's not worth trying to truncate the table.  This avoids having
-	 * lazy_truncate_heap() take access-exclusive lock on the table to attempt
-	 * a truncation that just fails immediately because there are tuples in
-	 * the last page.  This is worth avoiding mainly because such a lock must
-	 * be replayed on any hot standby, where it can be disruptive.
-	 */
-	next_unskippable_block = 0;
-	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-	{
-		while (next_unskippable_block < nblocks)
-		{
-			uint8		vmstatus;
-
-			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-												&vmbuffer);
-			if (aggressive)
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-					break;
-			}
-			else
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-					break;
-			}
-			vacuum_delay_point();
-			next_unskippable_block++;
-		}
-	}
-
-	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
-		skipping_blocks = true;
-	else
-		skipping_blocks = false;
-
-	for (blkno = 0; blkno < nblocks; blkno++)
+#ifdef PLV_TIME
+	pg_rusage_init(&ru_scan);
+#endif
+	while((blkno = lazy_scan_get_nextpage(onerel, lvstate, lvscan,
+										  &all_visible_according_to_vm,
+										  &vmbuffer, options, aggressive)) != InvalidBlockNumber)
 	{
 		Buffer		buf;
 		Page		page;
@@ -597,99 +799,31 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		int			prev_dead_count;
 		int			nfrozen;
 		Size		freespace;
-		bool		all_visible_according_to_vm = false;
 		bool		all_visible;
 		bool		all_frozen = true;	/* provided all_visible is also true */
 		bool		has_dead_tuples;
 		TransactionId visibility_cutoff_xid = InvalidTransactionId;
-
-		/* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
-		(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
+		int			dtmax;
+		int			dtcount;
 
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
-		if (blkno == next_unskippable_block)
-		{
-			/* Time to advance next_unskippable_block */
-			next_unskippable_block++;
-			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-			{
-				while (next_unskippable_block < nblocks)
-				{
-					uint8		vmskipflags;
-
-					vmskipflags = visibilitymap_get_status(onerel,
-														   next_unskippable_block,
-														   &vmbuffer);
-					if (aggressive)
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-							break;
-					}
-					else
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-							break;
-					}
-					vacuum_delay_point();
-					next_unskippable_block++;
-				}
-			}
-
-			/*
-			 * We know we can't skip the current block.  But set up
-			 * skipping_blocks to do the right thing at the following blocks.
-			 */
-			if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
-				skipping_blocks = true;
-			else
-				skipping_blocks = false;
-
-			/*
-			 * Normally, the fact that we can't skip this block must mean that
-			 * it's not all-visible.  But in an aggressive vacuum we know only
-			 * that it's not all-frozen, so it might still be all-visible.
-			 */
-			if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
-				all_visible_according_to_vm = true;
-		}
-		else
-		{
-			/*
-			 * The current block is potentially skippable; if we've seen a
-			 * long enough run of skippable blocks to justify skipping it, and
-			 * we're not forced to check it, then go ahead and skip.
-			 * Otherwise, the page must be at least all-visible if not
-			 * all-frozen, so we can set all_visible_according_to_vm = true.
-			 */
-			if (skipping_blocks && !FORCE_CHECK_PAGE())
-			{
-				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was all-frozen, so we have to recheck; but
-				 * in this case an approximate answer is OK.
-				 */
-				if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
-					vacrelstats->frozenskipped_pages++;
-				continue;
-			}
-			all_visible_according_to_vm = true;
-		}
-
 		vacuum_delay_point();
 
 		/*
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if (IsDeadTupleShared(lvstate))
+			SpinLockAcquire(&lvstate->dtctl->mutex);
+
+		dtmax = lvstate->dtctl->dt_max;
+		dtcount = lvstate->dtctl->dt_count;
+
+		if (IsDeadTupleShared(lvstate))
+			SpinLockRelease(&lvstate->dtctl->mutex);
+
+		if (((dtmax - dtcount) < MaxHeapTuplesPerPage) && dtcount > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -697,6 +831,19 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			};
 			int64		hvp_val[2];
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+			/*
+			 * Here we're about to vacuum the table and indexes actually. Before
+			 * entering vacuum state, we have to wait for other vacuum worker to
+			 * reach here.
+			 */
+			lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_vacuum);
+#endif
+
 			/*
 			 * Before beginning index vacuuming, we release any pin we may
 			 * hold on the visibility map page.  This isn't necessary for
@@ -716,11 +863,12 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
 
-			/* Remove index entries */
+			/* Remove assigned index entries */
 			for (i = 0; i < nindexes; i++)
-				lazy_vacuum_index(Irel[i],
-								  &indstats[i],
-								  vacrelstats);
+			{
+				if (IsAssignedIndex(i, lvstate->pstate))
+					lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+			}
 
 			/*
 			 * Report that we are now vacuuming the heap.  We also increase
@@ -733,19 +881,28 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
 
 			/* Remove tuples from heap */
-			lazy_vacuum_heap(onerel, vacrelstats);
+			lazy_vacuum_heap(onerel, lvstate);
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 			/*
-			 * Forget the now-vacuumed tuples, and press on, but be careful
-			 * not to reset latestRemovedXid since we want that value to be
-			 * valid.
+			 * Here we've done vacuum on the heap and index and we are going
+			 * to begin the next round scan on heap. Wait until all vacuum worker
+			 * finished vacuum. After all vacuum workers finished, forget the
+			 * now-vacuumed tuples, and press on, but be careful not to reset
+			 * latestRemoveXid since we want that value to be valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
-			vacrelstats->num_index_scans++;
+			lazy_end_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_scan);
+#endif
 
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			vacrelstats->num_index_scans++;
 		}
 
 		/*
@@ -771,7 +928,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * it's OK to skip vacuuming pages we get a lock conflict on. They
 			 * will be dealt with in some future vacuum.
 			 */
-			if (!aggressive && !FORCE_CHECK_PAGE())
+			if (!aggressive && !FORCE_CHECK_PAGE(blkno))
 			{
 				ReleaseBuffer(buf);
 				vacrelstats->pinskipped_pages++;
@@ -923,7 +1080,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = lvstate->dtctl->dt_count;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -962,7 +1119,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 */
 			if (ItemIdIsDead(itemid))
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				all_visible = false;
 				continue;
 			}
@@ -1067,7 +1224,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			if (tupgone)
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
 													   &vacrelstats->latestRemovedXid);
 				tups_vacuumed += 1;
@@ -1132,13 +1289,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/*
 		 * If there are no indexes then we can vacuum the page right now
-		 * instead of doing a second scan.
+		 * instead of doing a second scan. Because each parallel worker uses its
+		 * own dead tuple area they can vacuum independently.
 		 */
-		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+		if (Irel == NULL && lvstate->dtctl->dt_count > 0)
 		{
 			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+			lazy_vacuum_page(onerel, blkno, buf, 0, lvstate, &vmbuffer);
 			has_dead_tuples = false;
 
 			/*
@@ -1146,7 +1303,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lvstate->dtctl->dt_count = 0;
+
 			vacuumed_pages++;
 		}
 
@@ -1249,7 +1407,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (lvstate->dtctl->dt_count == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
@@ -1264,10 +1422,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_dead_tuples = nkeep;
 
 	/* now we can compute the new value for pg_class.reltuples */
-	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
-														 nblocks,
-														 vacrelstats->tupcount_pages,
-														 num_tuples);
+	if (!lvstate->parallel_mode)
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 nblocks,
+															 vacrelstats->tupcount_pages,
+															 num_tuples);
 
 	/*
 	 * Release any remaining pin on visibility map page.
@@ -1280,13 +1439,25 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (lvstate->dtctl->dt_count > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
 			PROGRESS_VACUUM_NUM_INDEX_VACUUMS
 		};
 		int64		hvp_val[2];
+#ifdef PLV_TIME
+		elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+		/*
+		 * Here we're about to vacuum the table and indexes actually. Before
+		 * entering vacuum state, we have to wait for other vacuum worker to
+		 * reach here.
+		 */
+		lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+		pg_rusage_init(&ru_vacuum);
+#endif
 
 		/* Log cleanup info before we touch indexes */
 		vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -1297,9 +1468,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/* Remove index entries */
 		for (i = 0; i < nindexes; i++)
-			lazy_vacuum_index(Irel[i],
-							  &indstats[i],
-							  vacrelstats);
+		{
+			if (IsAssignedIndex(i, lvstate->pstate))
+				lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+		}
 
 		/* Report that we are now vacuuming the heap */
 		hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
@@ -1309,8 +1481,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		/* Remove tuples from heap */
 		pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 									 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
-		lazy_vacuum_heap(onerel, vacrelstats);
+
+		lazy_vacuum_heap(onerel, lvstate);
+
 		vacrelstats->num_index_scans++;
+#ifdef PLV_TIME
+		elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 	}
 
 	/* report all blocks vacuumed; and that we're cleaning up */
@@ -1320,7 +1497,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
-		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+	{
+		if (IsAssignedIndex(i, lvstate->pstate))
+			lazy_cleanup_index(Irel[i], indstats[i], lvstate->vacrelstats,
+							   lvstate->parallel_mode ? &(lvstate->indstats[i]) : NULL);
+	}
 
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
@@ -1329,12 +1510,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	lv_endscan(lvscan);
+
 	/*
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
 	 */
 	initStringInfo(&buf);
 	appendStringInfo(&buf,
+					 "------- worker %d TOTAL stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
 					 _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
 					 nkeep, OldestXmin);
 	appendStringInfo(&buf, _("There were %.0f unused item pointers.\n"),
@@ -1362,6 +1547,35 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pfree(buf.data);
 }
 
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+lazy_gather_vacuum_stats(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	int	i;
+	LVRelStats *lvstats_list;
+
+	lvstats_list = (LVRelStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_VACUUM_STATS, false);
+
+	/* Gather each worker stats */
+	for (i = 0; i < pcxt->nworkers_launched; i++)
+	{
+		LVRelStats *wstats = lvstats_list + sizeof(LVRelStats) * i;
+
+		vacrelstats->scanned_pages += wstats->scanned_pages;
+		vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+		vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+		vacrelstats->scanned_tuples += wstats->scanned_tuples;
+		vacrelstats->new_dead_tuples += wstats->new_dead_tuples;
+		vacrelstats->pages_removed += wstats->pages_removed;
+		vacrelstats->tuples_deleted += wstats->tuples_deleted;
+		vacrelstats->nonempty_pages += wstats->nonempty_pages;
+	}
+
+	/* all vacuum workers have same value of rel_pages */
+	vacrelstats->rel_pages = lvstats_list->rel_pages;
+}
 
 /*
  *	lazy_vacuum_heap() -- second pass over the heap
@@ -1375,18 +1589,23 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
  * process index entry removal in batches as large as possible.
  */
 static void
-lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
+lazy_vacuum_heap(Relation onerel, LVState *lvstate)
 {
 	int			tupindex;
 	int			npages;
 	PGRUsage	ru0;
+	BlockNumber	prev_tblk;
 	Buffer		vmbuffer = InvalidBuffer;
+	ItemPointer	deadtuples = lvstate->deadtuples;
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+	BlockNumber	ntuples = 0;
 
 	pg_rusage_init(&ru0);
 	npages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+
+	while (tupindex < dtctl->dt_count)
 	{
 		BlockNumber tblk;
 		Buffer		buf;
@@ -1395,7 +1614,40 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 
 		vacuum_delay_point();
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		/*
+		 * If the dead tuple TIDs are shared with all vacuum workers,
+		 * we acquire the lock and advance tupindex before vacuuming.
+		 *
+		 * NB: The number of maximum tuple can be stored into single
+		 * page is not a large number in most cases. We can use spinlock
+		 * here.
+		 */
+		if (IsDeadTupleShared(lvstate))
+		{
+			SpinLockAcquire(&(dtctl->mutex));
+
+			tupindex = dtctl->dt_index;
+
+			if (tupindex >= dtctl->dt_count)
+			{
+				SpinLockRelease(&(dtctl->mutex));
+				break;
+			}
+
+			/* Advance dtct->dt_index */
+			prev_tblk = tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
+			while(prev_tblk == tblk &&
+				  dtctl->dt_index < dtctl->dt_count)
+			{
+				tblk = ItemPointerGetBlockNumber(&deadtuples[dtctl->dt_index]);
+				dtctl->dt_index++;
+				ntuples++;
+			}
+
+			SpinLockRelease(&(dtctl->mutex));
+		}
+
+		tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
 		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
 								 vac_strategy);
 		if (!ConditionalLockBufferForCleanup(buf))
@@ -1404,7 +1656,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 			++tupindex;
 			continue;
 		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, lvstate,
 									&vmbuffer);
 
 		/* Now that we've compacted the page, record its available space */
@@ -1422,10 +1674,16 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 		vmbuffer = InvalidBuffer;
 	}
 
+#ifdef PLV_TIME
+	elog(WARNING, "%d TABLE %s", ParallelWorkerNumber, pg_rusage_show(&ru0));
+#endif
+	ereport(elevel,
+			(errmsg("------- worker %d VACUUM HEAP stats -------", ParallelWorkerNumber)));
+
 	ereport(elevel,
 			(errmsg("\"%s\": removed %d row versions in %d pages",
 					RelationGetRelationName(onerel),
-					tupindex, npages),
+					ntuples, npages),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1435,34 +1693,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
- * tuple for this page.  We assume the rest follow sequentially.
- * The return value is the first tupindex after the tuples of this page.
  */
 static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < lvstate->dtctl->dt_count; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&lvstate->deadtuples[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&lvstate->deadtuples[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1587,12 +1843,12 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
  *	lazy_vacuum_index() -- vacuum one index relation.
  *
  *		Delete all the index entries pointing to tuples listed in
- *		vacrelstats->dead_tuples, and update running statistics.
+ *		lvstate->deadtuples, and update running statistics.
  */
 static void
 lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats)
+				  LVState	*lvstate)
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
@@ -1603,17 +1859,22 @@ lazy_vacuum_index(Relation indrel,
 	ivinfo.analyze_only = false;
 	ivinfo.estimated_count = true;
 	ivinfo.message_level = elevel;
-	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
+	ivinfo.num_heap_tuples = lvstate->vacrelstats->old_rel_tuples;
 	ivinfo.strategy = vac_strategy;
 
 	/* Do bulk deletion */
-	*stats = index_bulk_delete(&ivinfo, *stats,
-							   lazy_tid_reaped, (void *) vacrelstats);
+	*stats = index_bulk_delete(&ivinfo, *stats, lazy_tid_reaped, (void *) lvstate);
 
+#ifdef PLV_TIME
+	elog(WARNING, "%d INDEX(%d) %s", ParallelWorkerNumber, RelationGetRelid(indrel),
+		 pg_rusage_show(&ru0));
+#endif
+	ereport(elevel,
+			(errmsg("------- worker %d VACUUM INDEX stats -------", ParallelWorkerNumber)));
 	ereport(elevel,
 			(errmsg("scanned index \"%s\" to remove %d row versions",
 					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+					lvstate->dtctl->dt_count),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1621,9 +1882,8 @@ lazy_vacuum_index(Relation indrel,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  */
 static void
-lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats)
+lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat)
 {
 	IndexVacuumInfo ivinfo;
 	PGRUsage	ru0;
@@ -1639,24 +1899,41 @@ lazy_cleanup_index(Relation indrel,
 
 	stats = index_vacuum_cleanup(&ivinfo, stats);
 
+	/* Will be updated by leader process after vacuumed */
+	if (indstat)
+		indstat->updated = false;
+
 	if (!stats)
 		return;
 
 	/*
 	 * Now update statistics in pg_class, but only if the index says the count
-	 * is accurate.
+	 * is accurate. In parallel lazy vacuum, the worker can not update these
+	 * information by itself, so save to DSM and then the launcher process
+	 * updates it later.
 	 */
 	if (!stats->estimated_count)
-		vac_update_relstats(indrel,
-							stats->num_pages,
-							stats->num_index_tuples,
-							0,
-							false,
-							InvalidTransactionId,
-							InvalidMultiXactId,
-							false);
+	{
+		if (indstat)
+		{
+			indstat->updated = true;
+			indstat->num_pages = stats->num_pages;
+			indstat->num_tuples = stats->num_index_tuples;
+		}
+		else
+			vac_update_relstats(indrel,
+								stats->num_pages,
+								stats->num_index_tuples,
+								0,
+								false,
+								InvalidTransactionId,
+								InvalidMultiXactId,
+								false);
+	}
 
 	ereport(elevel,
+			(errmsg("------- worker %d CLEANUP INDEX stats -------", ParallelWorkerNumber)));
+	ereport(elevel,
 			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
 					RelationGetRelationName(indrel),
 					stats->num_index_tuples,
@@ -1976,59 +2253,66 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 /*
  * lazy_space_alloc - space allocation decisions for lazy vacuum
  *
+ * In parallel lazy vacuum the space for dead tuple locations are already
+ * allocated in dynamic shared memory, so we allocate space for dead tuple
+ * locations in local memory only when in not parallel lazy vacuum and set
+ * MyDeadTuple.
+ *
  * See the comments at the head of this file for rationale.
  */
 static void
-lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
+lazy_space_alloc(LVState *lvstate, BlockNumber relblocks)
 {
-	long		maxtuples;
-	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
-	autovacuum_work_mem != -1 ?
-	autovacuum_work_mem : maintenance_work_mem;
-
-	if (vacrelstats->hasindex)
-	{
-		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
-		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+	long maxtuples;
 
-		/* curious coding here to ensure the multiplication can't overflow */
-		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
-			maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+	/*
+	 * In parallel mode, we already set the pointer to dead tuple
+	 * array when initialize.
+	 */
+	if (lvstate->parallel_mode && lvstate->vacrelstats->nindexes > 0)
+		return;
 
-		/* stay sane if small maintenance_work_mem */
-		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
-	}
-	else
-	{
-		maxtuples = MaxHeapTuplesPerPage;
-	}
+	maxtuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
 
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	/*
+	 * If in not parallel lazy vacuum, we need to allocate dead
+	 * tuple array in local memory.
+	 */
+	lvstate->deadtuples = palloc0(sizeof(ItemPointerData) * (int)maxtuples);
+	lvstate->dtctl = (LVDeadTupleCtl *) palloc(sizeof(LVDeadTupleCtl));
+	lvstate->dtctl->dt_max = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+	lvstate->dtctl->dt_count = 0;
 }
 
 /*
  * lazy_record_dead_tuple - remember one deletable tuple
+ *
+ * Acquiring the spinlock before remember is required if the dead tuple
+ * TIDs are shared with other vacuum workers.
  */
 static void
-lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr)
+lazy_record_dead_tuple(LVState *lvstate, ItemPointer itemptr)
 {
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockAcquire(&(dtctl->mutex));
+
 	/*
 	 * The array shouldn't overflow under normal behavior, but perhaps it
 	 * could if we are given a really small maintenance_work_mem. In that
 	 * case, just forget the last few tuples (we'll get 'em next time).
 	 */
-	if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
+	if (dtctl->dt_count < dtctl->dt_max)
 	{
-		vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-		vacrelstats->num_dead_tuples++;
-		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 vacrelstats->num_dead_tuples);
+
+		lvstate->deadtuples[dtctl->dt_count] = *itemptr;
+		(dtctl->dt_count)++;
+		/* XXX : Update progress information here */
 	}
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockRelease(&(dtctl->mutex));
 }
 
 /*
@@ -2041,16 +2325,23 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
 static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
-	LVRelStats *vacrelstats = (LVRelStats *) state;
+	LVState *lvstate = (LVState *) state;
 	ItemPointer res;
 
+	/*
+	 * We can assume that the dead tuple TIDs are sorted by TID location
+	 * even when we shared the dead tuple TIDs with other vacuum workers.
+	 */
 	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
+								(void *) lvstate->deadtuples,
+								lvstate->dtctl->dt_count,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
 
-	return (res != NULL);
+	if (res != NULL)
+		return true;
+
+	return false;
 }
 
 /*
@@ -2194,3 +2485,622 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number we need to scan next, or InvalidBlockNumber if scan
+ * is done.
+ *
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
+ * sequentially, the OS should be doing readahead for us, so there's no
+ * gain in skipping a page now and then; that's likely to disable
+ * readahead and so be counterproductive. Also, skipping even a single
+ * page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * Before entering the main loop, establish the invariant that
+ * next_unskippable_block is the next block number >= blkno that we can't
+ * skip based on the visibility map, either all-visible for a regular scan
+ * or all-frozen for an aggressive scan.  We set it to nblocks if there's
+ * no such block.  We also set up the skipping_blocks flag correctly at
+ * this stage.
+ *
+ * In not parallel mode, before entering the main loop, establish the
+ * invariant that next_unskippable_block is the next block number >= blkno
+ * that's not we can't skip based on the visibility map, either all-visible
+ * for a regular scan or all-frozen for an aggressive scan.  We set it to
+ * nblocks if there's no such block.  We also set up the skipping_blocks
+ * flag correctly at this stage.
+ *
+ * In parallel mode, pstate is not NULL. We scan heap pages
+ * using parallel heap scan description. Each worker calls heap_parallelscan_nextpage()
+ * in order to exclusively get  block number we need to scan at next.
+ * If given block is all-visible according to visibility map, we skip to
+ * scan this block immediately unlike not parallel lazy scan.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer.  This is OK.  If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page.  It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them.  But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared.  Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * Computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table.  This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page.  This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static BlockNumber
+lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+					   LVScanDesc lvscan, bool *all_visible_according_to_vm,
+					   Buffer *vmbuffer, int options, bool aggressive)
+{
+	BlockNumber blkno;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+
+	if (lvstate->parallel_mode)
+	{
+		/*
+		 * In parallel lazy vacuum since it's hard to know how many consecutive
+		 * all-visible pages exits on table we skip to scan the heap page immediately.
+		 * if it is all-visible page.
+		 */
+		while ((blkno = heap_parallelscan_nextpage(lvscan->heapscan)) != InvalidBlockNumber)
+		{
+			*all_visible_according_to_vm = false;
+			vacuum_delay_point();
+
+			/* Consider to skip scan page according visibility map */
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0 &&
+				!FORCE_CHECK_PAGE(blkno))
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, blkno, vmbuffer);
+
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+					{
+						vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+					else if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+						*all_visible_according_to_vm = true;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+					{
+						if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+							vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+				}
+			}
+
+			/* We need to scan current blkno, break */
+			break;
+		}
+	}
+	else
+	{
+		bool skipping_blocks = false;
+
+		/* Initialize lv_nextunskippable_page if needed */
+		if (lvscan->lv_cblock == 0 && (options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+		{
+			while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, lvscan->lv_next_unskippable_block,
+													vmbuffer);
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+						break;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+						break;
+				}
+				vacuum_delay_point();
+				lvscan->lv_next_unskippable_block++;
+			}
+
+			if (lvscan->lv_next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+				skipping_blocks = true;
+			else
+				skipping_blocks = false;
+		}
+
+		/* Decide the block number we need to scan */
+		for (blkno = lvscan->lv_cblock; blkno < lvscan->lv_nblocks; blkno++)
+		{
+			if (blkno == lvscan->lv_next_unskippable_block)
+			{
+				/* Time to advance next_unskippable_block */
+				lvscan->lv_next_unskippable_block++;
+				if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+				{
+					while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+					{
+						uint8		vmstatus;
+
+						vmstatus = visibilitymap_get_status(onerel,
+															lvscan->lv_next_unskippable_block,
+															vmbuffer);
+						if (aggressive)
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+								break;
+						}
+						else
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+								break;
+						}
+						vacuum_delay_point();
+						lvscan->lv_next_unskippable_block++;
+					}
+				}
+
+				/*
+				 * We know we can't skip the current block.  But set up
+				 * skipping_all_visible_blocks to do the right thing at the
+				 * following blocks.
+				 */
+				if (lvscan->lv_next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
+					skipping_blocks = true;
+				else
+					skipping_blocks = false;
+
+				/*
+				 * Normally, the fact that we can't skip this block must mean that
+				 * it's not all-visible.  But in an aggressive vacuum we know only
+				 * that it's not all-frozen, so it might still be all-visible.
+				 */
+				if (aggressive && VM_ALL_VISIBLE(onerel, blkno, vmbuffer))
+					*all_visible_according_to_vm = true;
+
+				/* Found out that next unskippable block number */
+				break;
+			}
+			else
+			{
+				/*
+				 * The current block is potentially skippable; if we've seen a
+				 * long enough run of skippable blocks to justify skipping it, and
+				 * we're not forced to check it, then go ahead and skip.
+				 * Otherwise, the page must be at least all-visible if not
+				 * all-frozen, so we can set all_visible_according_to_vm = true.
+				 */
+				if (skipping_blocks && !FORCE_CHECK_PAGE(blkno))
+				{
+					/*
+					 * Tricky, tricky.  If this is in aggressive vacuum, the page
+					 * must have been all-frozen at the time we checked whether it
+					 * was skippable, but it might not be any more.  We must be
+					 * careful to count it as a skipped all-frozen page in that
+					 * case, or else we'll think we can't update relfrozenxid and
+					 * relminmxid.  If it's not an aggressive vacuum, we don't
+					 * know whether it was all-frozen, so we have to recheck; but
+					 * in this case an approximate answer is OK.
+					 */
+					if (aggressive || VM_ALL_FROZEN(onerel, blkno, vmbuffer))
+						vacrelstats->frozenskipped_pages++;
+					continue;
+				}
+
+				*all_visible_according_to_vm = true;
+
+				/* We need to scan current blkno, break */
+				break;
+			}
+		} /* for */
+
+		/* Advance the current block number for the next scan */
+		lvscan->lv_cblock = blkno + 1;
+	}
+
+	return (blkno == lvscan->lv_nblocks) ? InvalidBlockNumber : blkno;
+}
+
+/*
+ * Begin lazy vacuum scan. lvscan->heapscan is NULL if
+ * we're not in parallel lazy vacuum.
+ */
+static LVScanDesc
+lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan)
+{
+	LVScanDesc lvscan;
+
+	lvscan = (LVScanDesc) palloc(sizeof(LVScanDescData));
+
+	lvscan->lv_cblock = 0;
+	lvscan->lv_next_unskippable_block = 0;
+	lvscan->lv_nblocks = RelationGetNumberOfBlocks(onerel);
+
+	if (pscan != NULL)
+		lvscan->heapscan = heap_beginscan_parallel(onerel, pscan);
+	else
+		lvscan->heapscan = NULL;
+
+	return lvscan;
+}
+
+/*
+ * End lazy vacuum scan.
+ */
+static void
+lv_endscan(LVScanDesc lvscan)
+{
+	if (lvscan->heapscan != NULL)
+		heap_endscan(lvscan->heapscan);
+	pfree(lvscan);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Lazy Vacuum Support
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * Estimate storage for parallel lazy vacuum.
+ */
+static void
+lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	Size size = 0;
+	int keys = 0;
+	int vacuum_workers = pcxt->nworkers + 1;
+	long maxtuples = lazy_get_max_dead_tuples(vacrelstats);
+
+	/* Estimate size for parallel heap scan */
+	size += heap_parallelscan_estimate(SnapshotAny);
+	keys++;
+
+	/* Estimate size for vacuum statistics for only workers*/
+	size += BUFFERALIGN(mul_size(sizeof(LVRelStats), pcxt->nworkers));
+	keys++;
+
+	/* We have to share dead tuple information only when the table has indexes */
+	if (vacrelstats->nindexes > 0)
+	{
+		/* Estimate size for index statistics */
+		size += BUFFERALIGN(mul_size(sizeof(LVIndStats), vacrelstats->nindexes));
+		keys++;
+
+		/* Estimate size for dead tuple control */
+		size += BUFFERALIGN(sizeof(LVDeadTupleCtl));
+		keys++;
+
+		/* Estimate size for dead tuple array */
+		size += BUFFERALIGN(mul_size(
+							 mul_size(sizeof(ItemPointerData), maxtuples),
+							 vacuum_workers));
+		keys++;
+	}
+
+	/* Estimate size for parallel lazy vacuum state */
+	size += BUFFERALIGN(sizeof(LVParallelState));
+	keys++;
+
+	/* Estimate size for vacuum task */
+	size += BUFFERALIGN(sizeof(VacuumInfo));
+	keys++;
+
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, keys);
+}
+
+/*
+ * Initialize dynamic shared memory for parallel lazy vacuum. We store
+ * relevant informations of parallel heap scanning, dead tuple array
+ * and vacuum statistics for each worker and some parameters for lazy vacuum.
+ */
+static void
+lazy_initialize_dsm(ParallelContext *pcxt, Relation onerel, LVState *lvstate,
+					int options, bool aggressive)
+{
+	ParallelHeapScanDesc pscan_ptr;
+	ItemPointer	deadtuples_ptr;
+	char 		*lvrelstats_ptr;
+	LVParallelState *pstate_ptr;
+	LVIndStats	*indstats_ptr;
+	LVDeadTupleCtl	*dtctl_ptr;
+	int i;
+	int deadtuples_size;
+	int lvrelstats_size;
+	int	vacuum_workers = pcxt->nworkers + 1;
+	long max_tuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+
+	/* Allocate and initialize DSM for vacuum stats for each worker */
+	lvrelstats_size = mul_size(sizeof(LVRelStats), pcxt->nworkers);
+	lvrelstats_ptr = shm_toc_allocate(pcxt->toc, lvrelstats_size);
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		char *start;
+
+		start = lvrelstats_ptr + i * sizeof(LVRelStats);
+		memcpy(start, lvstate->vacrelstats, sizeof(LVRelStats));
+	}
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_STATS, lvrelstats_ptr);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Allocate and initialize DSM for dead tuple control */
+		dtctl_ptr = (LVDeadTupleCtl *) shm_toc_allocate(pcxt->toc, sizeof(LVDeadTupleCtl));
+		SpinLockInit(&(dtctl_ptr->mutex));
+		dtctl_ptr->dt_max = max_tuples * vacuum_workers;
+		dtctl_ptr->dt_count = 0;
+		dtctl_ptr->dt_index = 0;
+		lvstate->dtctl = dtctl_ptr;
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLE_CTL, dtctl_ptr);
+
+		/* Allocate and initialize DSM for dead tuple array */
+		deadtuples_size = mul_size(mul_size(sizeof(ItemPointerData), max_tuples),
+								   vacuum_workers);
+		deadtuples_ptr = (ItemPointer) shm_toc_allocate(pcxt->toc,
+														deadtuples_size);
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLES, deadtuples_ptr);
+		lvstate->deadtuples = deadtuples_ptr;
+
+		/* Allocate DSM for index statistics */
+		indstats_ptr = (LVIndStats *) shm_toc_allocate(pcxt->toc,
+													   mul_size(sizeof(LVIndStats),
+																lvstate->vacrelstats->nindexes));
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_INDEX_STATS, indstats_ptr);
+		lvstate->indstats = indstats_ptr;
+	}
+
+	/* Allocate and initialize DSM for parallel scan description */
+	pscan_ptr = (ParallelHeapScanDesc) shm_toc_allocate(pcxt->toc,
+														heap_parallelscan_estimate(SnapshotAny));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_SCAN, pscan_ptr);
+	heap_parallelscan_initialize(pscan_ptr, onerel, SnapshotAny);
+	lvstate->pscan = pscan_ptr;
+
+	/* Allocate and initialize DSM for parallel vacuum state */
+	pstate_ptr = (LVParallelState *) shm_toc_allocate(pcxt->toc, sizeof(LVParallelState));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_STATE, pstate_ptr);
+
+	ConditionVariableInit(&(pstate_ptr->cv));
+	SpinLockInit(&(pstate_ptr->mutex));
+	pstate_ptr->nworkers = vacuum_workers;
+	pstate_ptr->state = VACSTATE_SCAN;
+	pstate_ptr->info.aggressive = aggressive;
+	pstate_ptr->info.options = options;
+	pstate_ptr->info.oldestxmin = OldestXmin;
+	pstate_ptr->info.freezelimit = FreezeLimit;
+	pstate_ptr->info.multixactcutoff = MultiXactCutoff;
+	pstate_ptr->info.elevel = elevel;
+	lvstate->pstate = pstate_ptr;
+}
+
+/*
+ * Initialize parallel lazy vacuum for worker.
+ */
+static LVState *
+lazy_initialize_worker(shm_toc *toc)
+{
+	LVState	*lvstate;
+	char *lvstats;
+
+	lvstate = (LVState *) palloc(sizeof(LVState));
+	lvstate->parallel_mode = true;
+
+	/* Set up vacuum stats */
+	lvstats = shm_toc_lookup(toc, VACUUM_KEY_VACUUM_STATS, false);
+	lvstate->vacrelstats = (LVRelStats *) (lvstats +
+										   sizeof(LVRelStats) * ParallelWorkerNumber);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Set up dead tuple control */
+		lvstate->dtctl = (LVDeadTupleCtl *) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLE_CTL, false);
+
+		/* Set up dead tuple array */
+		lvstate->deadtuples = (ItemPointer) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLES, false);
+
+		/* Set up index statistics */
+		lvstate->indstats = (LVIndStats *) shm_toc_lookup(toc, VACUUM_KEY_INDEX_STATS, false);
+	}
+
+	/* Set up parallel vacuum state */
+	lvstate->pstate = (LVParallelState *) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_STATE, false);
+
+	/* Set up parallel heap scan description */
+	lvstate->pscan = (ParallelHeapScanDesc) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_SCAN, false);
+
+	/* Set up parameters for lazy vacuum */
+	OldestXmin = lvstate->pstate->info.oldestxmin;
+	FreezeLimit = lvstate->pstate->info.freezelimit;
+	MultiXactCutoff = lvstate->pstate->info.multixactcutoff;
+	elevel = lvstate->pstate->info.elevel;
+
+	return lvstate;
+}
+
+/*
+ * In the end of actual vacuumming on table and indexes actually, we have
+ * to wait for other all vacuum workers to reach here before clearing dead
+ * tuple TIDs information.
+ */
+static void
+lazy_end_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+	{
+		lvstate->dtctl->dt_count = 0;
+		return;
+	}
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		/* Fetch shared information */
+		if (!counted)
+			pstate->finish_count++;
+		finish_count = pstate->finish_count;
+		state = pstate->state;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_SCAN)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * If all launched parallel vacuum workers reached here, we can clear the
+		 * dead tuple TIDs information.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			/* Clear dead tuples */
+			lvstate->dtctl->dt_count = 0;
+
+			/* need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_SCAN;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_DONE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Before starting actual vacuuming on table and indexes, we have to wait for
+ * other all vacuum workers so that all worker can see the same dead tuple TIDs
+ * information when vacuuming.
+ */
+static void
+lazy_prepare_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+		return;
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		if (!counted)
+			pstate->finish_count++;
+		state = pstate->state;
+		finish_count = pstate->finish_count;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_VACUUM)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * The leader process can change parallel vacuum state if all workers
+		 * have reached here.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			qsort((void *) lvstate->deadtuples, lvstate->dtctl->dt_count,
+				  sizeof(ItemPointerData), vac_cmp_itemptr);
+
+			/* XXX: need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_VACUUM;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_PREPARE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Return the number of maximum dead tuples can be stored according
+ * to vac_work_mem.
+ */
+static long
+lazy_get_max_dead_tuples(LVRelStats *vacrelstats)
+{
+	long maxtuples;
+	int	vac_work_mem = IsAutoVacuumWorkerProcess() &&
+		autovacuum_work_mem != -1 ?
+		autovacuum_work_mem : maintenance_work_mem;
+
+	if (vacrelstats->nindexes != 0)
+	{
+		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+		maxtuples = Min(maxtuples, INT_MAX);
+		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+
+		/* curious coding here to ensure the multiplication can't overflow */
+		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > vacrelstats->old_rel_pages)
+			maxtuples = vacrelstats->old_rel_pages * LAZY_ALLOC_TUPLES;
+
+		/* stay sane if small maintenance_work_mem */
+		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+	}
+	else
+	{
+		maxtuples = MaxHeapTuplesPerPage;
+	}
+
+	return maxtuples;
+}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 8d92c03..b258f62 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1662,7 +1662,12 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
 static bool
 _equalVacuumStmt(const VacuumStmt *a, const VacuumStmt *b)
 {
-	COMPARE_SCALAR_FIELD(options);
+	if (a->options.flags != b->options.flags)
+		return false;
+
+	if (a->options.nworkers != b->options.nworkers)
+		return false;
+
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_NODE_FIELD(va_cols);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d0de99..4620e8b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -187,6 +187,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 			   bool *deferrable, bool *initdeferred, bool *not_valid,
 			   bool *no_inherit, core_yyscan_t yyscanner);
 static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+static VacuumOptions *makeVacOpt(VacuumOption flag, int nworkers);
 
 %}
 
@@ -237,6 +238,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	struct ImportQual	*importqual;
 	InsertStmt			*istmt;
 	VariableSetStmt		*vsetstmt;
+	VacuumOptions		*vacopts;
 	PartitionElem		*partelem;
 	PartitionSpec		*partspec;
 	PartitionBoundSpec	*partboundspec;
@@ -305,7 +307,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_extension_opt_item alter_extension_opt_item
 
 %type <ival>	opt_lock lock_type cast_context
-%type <ival>	vacuum_option_list vacuum_option_elem
+%type <vacopts>	vacuum_option_list vacuum_option_elem
 %type <boolean>	opt_or_replace
 				opt_grant_grant_option opt_grant_admin_option
 				opt_nowait opt_if_exists opt_with_data
@@ -10121,47 +10123,59 @@ cluster_index_specification:
 VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose qualified_name
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = $5;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose AnalyzeStmt
 				{
 					VacuumStmt *n = (VacuumStmt *) $5;
-					n->options |= VACOPT_VACUUM;
+					n->options.flags |= VACOPT_VACUUM;
 					if ($2)
-						n->options |= VACOPT_FULL;
+						n->options.flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						n->options.flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						n->options.flags |= VACOPT_VERBOSE;
+					n->options.nworkers = 0;
 					$$ = (Node *)n;
 				}
 			| VACUUM '(' vacuum_option_list ')'
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *) n;
@@ -10169,29 +10183,52 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 			| VACUUM '(' vacuum_option_list ')' qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = $5;
 					n->va_cols = $6;
 					if (n->va_cols != NIL)	/* implies analyze */
-						n->options |= VACOPT_ANALYZE;
+						n->options.flags |= VACOPT_ANALYZE;
 					$$ = (Node *) n;
 				}
 		;
 
 vacuum_option_list:
 			vacuum_option_elem								{ $$ = $1; }
-			| vacuum_option_list ',' vacuum_option_elem		{ $$ = $1 | $3; }
+			| vacuum_option_list ',' vacuum_option_elem
+			{
+				VacuumOptions *vacopts1 = (VacuumOptions *)$1;
+				VacuumOptions *vacopts2 = (VacuumOptions *)$3;
+
+				vacopts1->flags |= vacopts2->flags;
+				if (vacopts2->flags == VACOPT_PARALLEL)
+					vacopts1->nworkers = vacopts2->nworkers;
+
+				$$ = vacopts1;
+				pfree(vacopts2);
+			}
 		;
 
 vacuum_option_elem:
-			analyze_keyword		{ $$ = VACOPT_ANALYZE; }
-			| VERBOSE			{ $$ = VACOPT_VERBOSE; }
-			| FREEZE			{ $$ = VACOPT_FREEZE; }
-			| FULL				{ $$ = VACOPT_FULL; }
+			analyze_keyword		{ $$ = makeVacOpt(VACOPT_ANALYZE, 0); }
+			| VERBOSE			{ $$ = makeVacOpt(VACOPT_VERBOSE, 0); }
+			| FREEZE			{ $$ = makeVacOpt(VACOPT_FREEZE, 0); }
+			| FULL				{ $$ = makeVacOpt(VACOPT_FULL, 0); }
+			| PARALLEL ICONST
+				{
+					if ($2 < 1)
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+								 errmsg("parallel vacuum degree must be more than 1"),
+								 parser_errposition(@1)));
+					$$ = makeVacOpt(VACOPT_PARALLEL, $2);
+				}
 			| IDENT
 				{
 					if (strcmp($1, "disable_page_skipping") == 0)
-						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+						$$ = makeVacOpt(VACOPT_DISABLE_PAGE_SKIPPING, 1);
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
@@ -10199,27 +10236,36 @@ vacuum_option_elem:
 									 parser_errposition(@1)));
 				}
 		;
-
 AnalyzeStmt:
 			analyze_keyword opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| analyze_keyword opt_verbose qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = $3;
 					n->va_cols = $4;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 		;
 
@@ -15903,6 +15949,16 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
 	return (Node *) s;
 }
 
+static VacuumOptions *
+makeVacOpt(VacuumOption flag, int nworkers)
+{
+	VacuumOptions *vacopt = palloc(sizeof(VacuumOptions));
+
+	vacopt->flags = flag;
+	vacopt->nworkers = nworkers;
+	return vacopt;
+}
+
 /* parser_init()
  * Initialize to parse one query string
  */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 00b1e82..cf4d61f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -187,7 +187,7 @@ typedef struct av_relation
 typedef struct autovac_table
 {
 	Oid			at_relid;
-	int			at_vacoptions;	/* bitmask of VacuumOption */
+	VacuumOptions at_vacoptions;	/* contains bitmask of VacuumOption */
 	VacuumParams at_params;
 	int			at_vacuum_cost_delay;
 	int			at_vacuum_cost_limit;
@@ -2483,7 +2483,7 @@ do_autovacuum(void)
 			 * next table in our list.
 			 */
 			HOLD_INTERRUPTS();
-			if (tab->at_vacoptions & VACOPT_VACUUM)
+			if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 				errcontext("automatic vacuum of table \"%s.%s.%s\"",
 						   tab->at_datname, tab->at_nspname, tab->at_relname);
 			else
@@ -2899,10 +2899,11 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab = palloc(sizeof(autovac_table));
 		tab->at_relid = relid;
 		tab->at_sharedrel = classForm->relisshared;
-		tab->at_vacoptions = VACOPT_SKIPTOAST |
+		tab->at_vacoptions.flags = VACOPT_SKIPTOAST |
 			(dovacuum ? VACOPT_VACUUM : 0) |
 			(doanalyze ? VACOPT_ANALYZE : 0) |
 			(!wraparound ? VACOPT_NOWAIT : 0);
+		tab->at_vacoptions.nworkers = 1;
 		tab->at_params.freeze_min_age = freeze_min_age;
 		tab->at_params.freeze_table_age = freeze_table_age;
 		tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
@@ -3149,10 +3150,10 @@ autovac_report_activity(autovac_table *tab)
 	int			len;
 
 	/* Report the command and possible options */
-	if (tab->at_vacoptions & VACOPT_VACUUM)
+	if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: VACUUM%s",
-				 tab->at_vacoptions & VACOPT_ANALYZE ? " ANALYZE" : "");
+				 tab->at_vacoptions.flags & VACOPT_ANALYZE ? " ANALYZE" : "");
 	else
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: ANALYZE");
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 1f75e2e..ba61fc3 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3606,6 +3606,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
 			event_name = "ParallelBitmapScan";
 			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_PREPARE:
+			event_name = "ParallelVacuumPrepare";
+			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_DONE:
+			event_name = "ParallelVacuumDone";
+			break;
 		case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
 			event_name = "ProcArrayGroupUpdate";
 			break;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 775477c..3669d48 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -669,7 +669,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				VacuumStmt *stmt = (VacuumStmt *) parsetree;
 
 				/* we choose to allow this during "read only" transactions */
-				PreventCommandDuringRecovery((stmt->options & VACOPT_VACUUM) ?
+				PreventCommandDuringRecovery((stmt->options.flags & VACOPT_VACUUM) ?
 											 "VACUUM" : "ANALYZE");
 				/* forbidden in parallel mode due to CommandIsReadOnly */
 				ExecVacuum(stmt, isTopLevel);
@@ -2498,7 +2498,7 @@ CreateCommandTag(Node *parsetree)
 			break;
 
 		case T_VacuumStmt:
-			if (((VacuumStmt *) parsetree)->options & VACOPT_VACUUM)
+			if (((VacuumStmt *) parsetree)->options.flags & VACOPT_VACUUM)
 				tag = "VACUUM";
 			else
 				tag = "ANALYZE";
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 08a08c8..3c2d5df 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2040,7 +2040,6 @@ EstimateSnapshotSpace(Snapshot snap)
 	Size		size;
 
 	Assert(snap != InvalidSnapshot);
-	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
 	size = add_size(sizeof(SerializedSnapshotData),
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index b2132e7..248a670 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,7 @@ extern Size heap_parallelscan_estimate(Snapshot snapshot);
 extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
 							 Relation relation, Snapshot snapshot);
 extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index a903511..1fc10bf 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "access/heapam.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -157,7 +159,7 @@ extern int	vacuum_multixact_freeze_table_age;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
-extern void vacuum(int options, RangeVar *relation, Oid relid,
+extern void vacuum(VacuumOptions options, RangeVar *relation, Oid relid,
 	   VacuumParams *params, List *va_cols,
 	   BufferAccessStrategy bstrategy, bool isTopLevel);
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
@@ -188,8 +190,9 @@ extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
 
 /* in commands/vacuumlazy.c */
-extern void lazy_vacuum_rel(Relation onerel, int options,
+extern void lazy_vacuum_rel(Relation onerel, VacuumOptions options,
 				VacuumParams *params, BufferAccessStrategy bstrategy);
+extern void LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc);
 
 /* in commands/analyze.c */
 extern void analyze_rel(Oid relid, RangeVar *relation, int options,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 5f2a4a7..d81809b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3087,13 +3087,20 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_NOWAIT = 1 << 5,		/* don't wait to get lock (autovacuum only) */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_PARALLEL = 1 << 8	/* do VACUUM parallelly */
 } VacuumOption;
 
+typedef struct VacuumOptions
+{
+	VacuumOption flags; /* OR of VacuumOption flags */
+	int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
+
 typedef struct VacuumStmt
 {
 	NodeTag		type;
-	int			options;		/* OR of VacuumOption flags */
+	VacuumOptions	options;
 	RangeVar   *relation;		/* single table to process, or NULL */
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumStmt;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index cb05d9b..f52fdd8 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -811,6 +811,8 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_PARALLEL_BITMAP_SCAN,
+	WAIT_EVENT_PARALLEL_VACUUM_PREPARE,
+	WAIT_EVENT_PARALLEL_VACUUM_DONE,
 	WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
 	WAIT_EVENT_REPLICATION_SLOT_DROP,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 6f68663..8887f4d 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -80,6 +80,7 @@ CONTEXT:  SQL function "do_analyze" statement 1
 SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 -- partitioned table
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 7c5fb04..cbd8c44 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -61,6 +61,7 @@ VACUUM FULL vaccluster;
 VACUUM FULL vactst;
 
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 
 DROP TABLE vaccluster;
 DROP TABLE vactst;

result_0_to_16_degree.pngimage/png; name=result_0_to_16_degree.pngDownload

�PNG


IHDR��,�APLTE���������������@��  ��� �@����@����`��`��`��@��0`��`@@@@���`�``�`���`���@��`��`�`�����` �```    @@ @�`� `�``����@ � ����������  ���`����`�����`�@�@@�����`���������������������``������������� ����  � �� �  �@ �@��`��`�������@��@��`��p��������������������___???ccc�/�a�IDATx������*E���������(�+@����M�SmD���A�(���T���(E]�O�MU���zV5mYO�NU6�3�
}g< :��MY
��X��Y�Z5�����b�\��N�N�
������EWC�B���	.�Dp�m��m��"�G!����-�������'_(��G%��*�f�������x�e�2�{T�}��,E�����WO�+PFu�J����%�$�� `��E�h���	8`�)D�*�^��lE)���f	4]YvJ@�'��F~�_);�L�2�u����D�/��W�9P�2"��^���3������Qi�/��{��^m���MVg���R%���
�R��
6��OsH�����|����������%S@cZ�F�Zn��;�U�7����x�O+�������LYt��(�H���� �n���OYG����0F���Z�zjSwx��CIY</�M�.��7�v��C�Z7�g���Ha' ��	#��9G�N@��)�D0a����9�	g�C��#��d�f�����{��K���S��xi�����`b#�w3���]�	�>`�����_���)N����������S��P�������,�^����+VC4�%�g�K3�)��U��2��@@���;#��(���;��^��L1����j�U�������L]k�B@s���3�c��z%�k�r�� /"�u��i�1��-��6����b�Ex�����b��3�7������y��y7���>��(y	���'O�:��1��#B8D�	h�a�ng1w�6H�|�!V�s�H�	��b�S��z�i�)�[�fD�D0MD6P�{
I�)�[ `��'�|��Dgu�}���3Z@kh��>��t7%�@z�)Z�q�
��B#[���}����G@�����V�U[�����Bsp�U�;����&�Te7=i����)��M)������0����f%�
����&��u���X�`' H�������o��R8x���A�����)���=����}@���_Ng��;S���W���u?{D0M�������?� ��: ��0��� /   R�	�>`� `�P.��6@����������*��{�P���*��0s�G�?~��^�LrvQ�3��	�0�a&9�(�������0S�5�$���6#��x��4S$cZ�LrnQ�k3��+ ��I"�f�����?���X��yF�5��-�mF�s�� 	�f���E����'P_c���l����&8��|�Q���>���	)  �&��1O����'D�p�>`���	�W�u.��@@���	��(�)�?�gO�<!�%G��fO�<!� ^���&�TyB��g O	��`�?��xyB�� �<!��T4�(����[%�Z.D�O��B@*D_�x��j�*�������h  �����<�[[@��DVR�����FD/Wb��s�?���Q��AN:(G�"����`9�jZ&:�����/�����9���Xa' ��I"��G#���&�h�g�@s��+ H�����s�! �g�U��|H��2}��@��#`' �H��YC�`' ��	#�s�0�\@��y)0k�{�D0a�A `�����	���}��E��9G�N@��^�J��2����<5�	�����r�u�^=����S�> ��"����K�Kyj��]�Q��Gm��Y@��;���"`#�t���8! �Dp~t��jMW���^@�3�Zw�]=�=����0��q��m�<q7�:Q�,������$��$y����^�VM�|�
B�gLp������R+/�S�#O
#�����N���;��@@&������?�Rg�p���0��7}�ALF��d����RA�`' ��	#�s�0��� /   f
}/����&�0(
3�{��	�R `����	�>`��"@��#`' �&��\��YC�`' ��	#�s�0��� /   ���0��^;�LaP�s��y)����A��h  ���hyB�� GyB4�D!Z��%=�a��h`FuOa@�<!���'��U�
��hyB������h `�D�2fg8N��������	�+ ��	#�-O�Al�	�6�3���q	�<D��A8C�<!Kz��<!�#���,�A���h `z�����,�A��h�����#��f2$�0��� /   ���00=��v��0��0��	����M�����B��,`���;���FB��^�U+�^UU��:�=�daP��v����e�����w+
����#�(`o��Mx�h����(��Q��6�Y���JA��^�u�A@!��~���-���#�p]��)�������n!`�n	(g����u�D�g���?��[����r8	�e{���wC����6u#o>����n?�&�0(��oBv~�da��oB��&�*�aw�&8;�6�])��n����! �\�����.�t,��^���y�����]�4}��E�4%����#`' �H��'��C��8�'D�B���<[?��}���<!��F���;c`�4[Uk�����dT���phQ��?�jmr���%z5h�3��l�N.-����r<�'D3��|�JMJ�w��>���&��i40)���[3���|�_T���Zt�0�h�
�>`��"D8V�y���	�������5�	I_�$���g���J���������	��`=�7iF�*�c�0�x���2��|6L�7��� �����R@��<���l�K���r\@�d���@�����#�p]�Z��]��x�0m�7F�}���[4��q�:`��B����8Q}�7�������L}/���l�� �j�.YT���0(� �@@����{�(!�J��� ��A����=a`*�{7���B���b,�S@��G�F\��&�{��Eu�����0����*�7!{
��p&���~` ���6�\��f
�@��(`��z��j�>`��"\0Lp���e����>�����-�`Ppf�U�&� r��pU@�k[��uh�����9aP��E�*`U���1�uuOa]��	F���@�]@�^�"�N�E�x#�m�E�=}�AHW�w�)���os*����#`'���/���0��D�b���	9# �	}/ Z��%=�N���VQ?���uOaP��NyB�� ;yB�8�����0 |<.I>����D/����3���6�n��;Q�yBV)��F��C@�'�����~��6GuOaP��<
M��'��\F�O�<!iK����i�N�<!�!h��'��S��%=�n��s���\�)O��d7O$��}�9O��d7O��@OQ?/��G�t2��������{��	����r���!�;��YC�`'�o:�����0(�9�zO��A�N& |Op���@���G*�����0(��t�����0�����~�	�@2�{�<u��1���0(�3��Z����r\�~����
@�`'����j�,uOaP^9�i0�����0����@3�t. |:L��4���	x� n������#��|o������#�"��6���y����	�ADX���DU����yBL�'�����X1
X�e����L�<��w3�{��"�k��B��	1
�{xnY�=}��WH-6�/���]�V��DY�|%�n��o���$�����|���{��U�{��EWc����_�e ���E�&��-=M��'�0G����<��<!z��'d�����$J,A+�0��U���<�b3$����-��V���5i7O�Ia�^�����e)d���'�@	�����a��o��"��� ��8��7r���a;��f�1�S@�^* <}/�����P�w�e�A��G��jY�=}��2;<^F10�[��@����P��v��
��<�}���>��$g��a�N��9Yn�
0���"�i�K�A���?4��!i�]
�i��8�A��:��3`���t@���:O�R�?�������N���'��d�`w�5MG����� }��G@v�w�LV��A[�	���3`��l�0��$�;�/a&������!�k���C/���8�@�ST��v�	���T0��uC�����A�O��$��}�C�K�h�Iy���V�+� ������IBB�`'��A����I=���"�NHux� ,���v��S� $�no���ZaM�m1�S���L#������h�j��4{�j,O��7r��7r9H�yar���f�O������+c��� ���A�S��%?�~����p�K����dDKT3/
}�F�m�wG�`L��i��?JT���;H���)ix<����A|�_��F�A����c.
�1��A�Onsc��� �'�A���&�HT�[|���/mz������|�C�D5g!�W��������i��)i�Os���+��&�{��,�i���^?�{+j}Y�[�?I�J���f�Os���;����}=�`��X���F8��"������J+i*.��$��q�7x6>��G���i�S���Q����60M��s�e��$�_L\�]�N�
x����{o	|�n�����x�sNE�e�Y0.>�@�^&��L@���3!�\�JF�:�w3�L��!�.IC��(�Q�����F3�4H��?�n-�	�M�S���@�{|��������v/�0(��w
qx����{o�t}@����`���rl+����U0��0����A�F��`��/I��aPj7
�P�Y�=}�G�����h*���h�j��Uc��+
SC�A@CA�H=�HD�`
��W�����%��\��I�_w&�[�������}_���)X������sx�Vx�V�nd�����q��OP)�3��S84�����x������y�5nB�� ���$���������t�#���������m�4�;�?���m���:��qP��0m�P�/���7
.�?���Ne�S��w��3N�1)��anO��SJ<���9�l�ms'�q\�E��b?O������q+�p�����w����J){������Ve�yB��(/�����bs���.=��	���3�~�qg:V,-����������'�$rl\�m2XT�����e;0=,������������<^��?����n�t�3�k�n�u�`0�(�W���	<������w��f?���)�}��mn�]����*;!D���0�0����(��(���j:�n����\7��]=�=ZBw#�L��,�{B���yB���C�<!&�wa!��D3������8O ��O��m��l��	1yb��E�w*����zv�'�$����0���|�-�����N@�F�a;A^@@@
��^;�LaP�s��y)0k�{&�����s�LaP�3 daa�=	�������!�@AJ�{�nO��&�0(��i0����0����i-r^p����y�H�B5�0%��n*�A��G����n������{�X
8>�x����0pza��vz�gpe�����^@
/�	�+�,��-'����|"��@�yB�S���m�c��AH��#��A�����V��h^���G��K����T�	G-��(��������m�Ti�s���yB�	x�����u�b����/�V�C������'��yB
� �d����W����x;�9�#M��b-���1d��~���umQ��p��]6��lS��k��Q�1���Xm�:�����1���y#�"��]s�����n[si�6��5G_S�P&`�E���Q�y�����*��r��w�H����p�|�vEL!�"�m6�;G��0�dSS������*b�X1G���C�����oC4\s�5][yT�T��O��dc3���oU�����]:��(�D����GiW��*b��Ys���M61��B�Y�*��"���"��U�q�m����������&<��wvT����(��Sd��t����C��[^�V��O=�mEL!���FVE�v���)���m���?%�,����/k:_1�d�F�W
�	O��������(B������������&<�ME�����(�U�_���Y���v7��
q��AI*��i�����]�y��*��Y]����@OK1t�����y{tV�71G���"���vnW�d�]�{l+���NVE����c��m��]+�f��yF
��I�EW�5����T���?�=E�E3�,����X"�RL^8*b��QS�������������Ki��:o6��Ox�u
O�v��\�M�����_�+4�z;�UlU����}M��"�6rU����V��4���]�8���n��$^�j��u������o����i�������1���i<����e)���q������8(��������������g�-k�V�����u7�����;��j
��S��QG:*b:��1m���QGEL��+b�Xa���g�mM��Ap���~]�e�����������;� �m<�6Y8����].�e��Y����Q�1����_�*�x�ME���r����U���"��Y�
�51��9��6s�8m���xw��j=�S{�V��Av��S=��V�QZ1����_�*b�n��X���"��U��k��mE��"���&����^a�{7����Cm(:L�_�WX��6�#��a�����!�P�����G@}���z �3{���"�N�V��X~l��c+�xC�[
�e��c~yy���(��������7-`�n4�l��s
�������6��\�xg�6e�M	�o���-�2��x��2��q��N��(o6�����oJ�y�2��sp^�B���2���Q�|&D%���6��M���u5�������;�A��F�g�f��>���,�����c~��6�����J����C�2L]�A����! p�,Uv��w�����Q��kx;/DOAc��/�O��\d��������������P2��2�**���������l�p�._�]0����r���V�,<�IQIEND�B`�

#41

Masahiko Sawada

sawada.mshk@gmail.com

over 8 years ago

In reply to: Masahiko Sawada (#40)

1 attachment(s)

Re: Block level parallel vacuum WIP

On Tue, Aug 15, 2017 at 10:13 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Wed, Jul 26, 2017 at 5:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote:

On 3/4/17 9:08 PM, Masahiko Sawada wrote:

On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.

I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.

From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.

Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.

I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.

Thank you!

I re-considered the basic design of parallel lazy vacuum. I didn't
change the basic concept of this feature and usage, the lazy vacuum
still executes with some parallel workers. In current design, dead
tuple TIDs are shared with all vacuum workers including leader process
when table has index. If we share dead tuple TIDs, we have to make two
synchronization points: before starting vacuum and before clearing
dead tuple TIDs. Before starting vacuum we have to make sure that the
dead tuple TIDs are not added no more. And before clearing dead tuple
TIDs we have to make sure that it's used no more.

For index vacuum, each indexes is assigned to a vacuum workers based
on ParallelWorkerNumber. For example, if a table has 5 indexes and
vacuum with 2 workers, the leader process and one vacuum worker are
assigned to 2 indexes, and another vacuum process is assigned the
remaining one. The following steps are how the parallel vacuum
processes if table has indexes.

1. The leader process and workers scan the table in parallel using
ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory.
2. Before vacuum on table, the leader process sort the dead tuple TIDs
in physical order once all workers completes to scan the table.
3. In vacuum on table, the leader process and workers reclaim garbage
on table in block-level parallel.
4. In vacuum on indexes, the indexes on table is assigned to
particular parallel worker or leader process. The process assigned to
a index vacuums on the index.
5. Before back to scanning the table, the leader process clears the
dead tuple TIDs once all workers completes to vacuum on table and
indexes.

Attached the latest patch but it's still PoC version patch and
contains some debug codes. Note that this patch still requires another
patch which moves the relation extension lock out of heavy-weight
lock[1]. The parallel lazy vacuum patch could work even without [1]
patch but could fail during vacuum in some cases.

Also, I attached the result of performance evaluation. The table size
is approximately 300MB ( > shared_buffers) and I deleted tuples on
every blocks before execute vacuum so that vacuum visits every blocks.
The server spec is
* Intel Xeon E5620 @ 2.4Ghz (8cores)
* 32GB RAM
* ioDrive

According to the result of table with indexes, performance of lazy
vacuum improved up to a point where the number of indexes and parallel
degree are the same. If a table has 16 indexes and vacuum with 16
workers, parallel vacuum is 10x faster than single process execution.
Also according to the result of table with no indexes, the parallel
vacuum is 5x faster than single process execution at 8 parallel
degree. Of course we can vacuum only for indexes

I'm planning to work on that in PG11, will register it to next CF.
Comment and feedback are very welcome.

Since the previous patch conflicts with current HEAD I attached the
latest version patch. Also, I measured performance benefit with more
large 4GB table and indexes and attached the result.

Since v4 patch conflicts with current HEAD I attached the latest version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

parallel_vacuum_v5.patchapplication/octet-stream; name=parallel_vacuum_v5.patchDownload

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 421c18d..b93231f 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  <refsynopsisdiv>
 <synopsis>
-VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
+VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | PARALLEL <replaceable class="PARAMETER">N</replaceable> | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable class="PARAMETER">table_name</replaceable> ]
 VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
 </synopsis>
@@ -130,6 +130,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">
    </varlistentry>
 
    <varlistentry>
+    <term><literal>PARALLEL <replaceable class="PARAMETER">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable class="PARAMETER">N
+      </replaceable> background workers. Collecting garbage on table is processed
+      in block-level parallel. For tables with indexes, parallel vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a index are processed
+      by particular parallel vacuum worker. This option can not use with <literal>FULL</>
+      option.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>DISABLE_PAGE_SKIPPING</literal></term>
     <listitem>
      <para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index d20f038..faf82db 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -91,7 +91,6 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
 						bool is_samplescan,
 						bool temp_snap);
 static void heap_parallelscan_startblock_init(HeapScanDesc scan);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1715,7 +1714,7 @@ retry:
  *		first backend gets an InvalidBlockNumber return.
  * ----------------
  */
-static BlockNumber
+BlockNumber
 heap_parallelscan_nextpage(HeapScanDesc scan)
 {
 	BlockNumber page;
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index ce1b907..940a218 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -19,6 +19,7 @@
 #include "access/xlog.h"
 #include "catalog/namespace.h"
 #include "commands/async.h"
+#include "commands/vacuum.h"
 #include "executor/execParallel.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -118,6 +119,9 @@ static const struct
 {
 	{
 		"ParallelQueryMain", ParallelQueryMain
+	},
+	{
+		"LazyVacuumWorkerMain", LazyVacuumWorkerMain
 	}
 };
 
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa1812..505f0fe 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -72,7 +72,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
 				  MultiXactId lastSaneMinMulti);
-static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
+static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options,
 		   VacuumParams *params);
 
 /*
@@ -87,17 +87,17 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	VacuumParams params;
 
 	/* sanity checks on options */
-	Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
-	Assert((vacstmt->options & VACOPT_VACUUM) ||
-		   !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
-	Assert((vacstmt->options & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
-	Assert(!(vacstmt->options & VACOPT_SKIPTOAST));
+	Assert(vacstmt->options.flags & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert((vacstmt->options.flags & VACOPT_VACUUM) ||
+		   !(vacstmt->options.flags & (VACOPT_FULL | VACOPT_FREEZE)));
+	Assert((vacstmt->options.flags & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
+	Assert(!(vacstmt->options.flags & VACOPT_SKIPTOAST));
 
 	/*
 	 * All freeze ages are zero if the FREEZE option is given; otherwise pass
 	 * them as -1 which means to use the default values.
 	 */
-	if (vacstmt->options & VACOPT_FREEZE)
+	if (vacstmt->options.flags & VACOPT_FREEZE)
 	{
 		params.freeze_min_age = 0;
 		params.freeze_table_age = 0;
@@ -146,7 +146,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
  * memory context that will not disappear at transaction commit.
  */
 void
-vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
+vacuum(VacuumOptions options, RangeVar *relation, Oid relid, VacuumParams *params,
 	   List *va_cols, BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	const char *stmttype;
@@ -157,7 +157,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 
 	Assert(params != NULL);
 
-	stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
+	stmttype = (options.flags & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
 
 	/*
 	 * We cannot run VACUUM inside a user transaction block; if we were inside
@@ -167,7 +167,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 *
 	 * ANALYZE (without VACUUM) can run either way.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 	{
 		PreventTransactionChain(isTopLevel, stmttype);
 		in_outer_xact = false;
@@ -189,17 +189,26 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
-	if ((options & VACOPT_FULL) != 0 &&
-		(options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
 
 	/*
+	 * Sanity check PARALLEL option.
+	 */
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_PARALLEL) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("VACUUM option PARALLEL cannnot be used with FULL")));
+
+	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
 	 */
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 		pgstat_vacuum_stat();
 
 	/*
@@ -245,11 +254,11 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 	 * transaction block, and also in an autovacuum worker, use own
 	 * transactions so we can release locks sooner.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 		use_own_xacts = true;
 	else
 	{
-		Assert(options & VACOPT_ANALYZE);
+		Assert(options.flags & VACOPT_ANALYZE);
 		if (IsAutoVacuumWorkerProcess())
 			use_own_xacts = true;
 		else if (in_outer_xact)
@@ -299,13 +308,13 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		{
 			Oid			relid = lfirst_oid(cur);
 
-			if (options & VACOPT_VACUUM)
+			if (options.flags & VACOPT_VACUUM)
 			{
 				if (!vacuum_rel(relid, relation, options, params))
 					continue;
 			}
 
-			if (options & VACOPT_ANALYZE)
+			if (options.flags & VACOPT_ANALYZE)
 			{
 				/*
 				 * If using separate xacts, start one for analyze. Otherwise,
@@ -318,7 +327,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 					PushActiveSnapshot(GetTransactionSnapshot());
 				}
 
-				analyze_rel(relid, relation, options, params,
+				analyze_rel(relid, relation, options.flags, params,
 							va_cols, in_outer_xact, vac_strategy);
 
 				if (use_own_xacts)
@@ -354,7 +363,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
 		StartTransactionCommand();
 	}
 
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 	{
 		/*
 		 * Update pg_database.datfrozenxid, and truncate pg_xact if possible.
@@ -1221,7 +1230,7 @@ vac_truncate_clog(TransactionId frozenXID,
  *		At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options, VacuumParams *params)
 {
 	LOCKMODE	lmode;
 	Relation	onerel;
@@ -1242,7 +1251,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 */
 	PushActiveSnapshot(GetTransactionSnapshot());
 
-	if (!(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_FULL))
 	{
 		/*
 		 * In lazy vacuum, we can set the PROC_IN_VACUUM flag, which lets
@@ -1282,7 +1291,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
 	 * way, we can be sure that no other backend is vacuuming the same table.
 	 */
-	lmode = (options & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+	lmode = (options.flags & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * Open the relation and get the appropriate lock on it.
@@ -1293,7 +1302,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * If we've been asked not to wait for the relation lock, acquire it first
 	 * in non-blocking mode, before calling try_relation_open().
 	 */
-	if (!(options & VACOPT_NOWAIT))
+	if (!(options.flags & VACOPT_NOWAIT))
 		onerel = try_relation_open(relid, lmode);
 	else if (ConditionalLockRelationOid(relid, lmode))
 		onerel = try_relation_open(relid, NoLock);
@@ -1412,7 +1421,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * us to process it.  In VACUUM FULL, though, the toast table is
 	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 	 */
-	if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_SKIPTOAST) && !(options.flags & VACOPT_FULL))
 		toast_relid = onerel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -1431,7 +1440,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (options & VACOPT_FULL)
+	if (options.flags & VACOPT_FULL)
 	{
 		/* close relation before vacuuming, but hold lock until commit */
 		relation_close(onerel, NoLock);
@@ -1439,7 +1448,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, false,
-					(options & VACOPT_VERBOSE) != 0);
+					(options.flags & VACOPT_VERBOSE) != 0);
 	}
 	else
 		lazy_vacuum_rel(onerel, options, params, vac_strategy);
@@ -1493,8 +1502,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
  * hit dangling index pointers.
  */
 void
-vac_open_indexes(Relation relation, LOCKMODE lockmode,
-				 int *nindexes, Relation **Irel)
+vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
 {
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 45b1859..d0ef39b 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -22,6 +22,20 @@
  * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * In PostgreSQL 10, we support a parallel option for lazy vacuum. In parallel
+ * lazy vacuum, multiple vacuum worker processes get blocks in parallel using
+ * parallel heap scan and process them. If a table with indexes the parallel
+ * vacuum workers vacuum the heap and indexes in parallel.  Also, since dead
+ * tuple TIDs is shared with all vacuum processes including the leader process
+ * the parallel vacuum processes have to make two synchronization points in
+ * lazy vacuum processing: before starting vacuum and before clearing dead
+ * tuple TIDs. In those two points the leader treats dead tuple TIDs as an
+ * arbiter. The information required by parallel lazy vacuum such as the
+ * statistics of table, parallel heap scan description have to be shared with
+ * all vacuum processes, and table statistics are funneled by the leader
+ * process after finished. However, dead tuple TIDs need to be shared only
+ * when the table has indexes. For table with no indexes, each parallel worker
+ * processes blocks and vacuum them independently.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -41,8 +55,10 @@
 #include "access/heapam_xlog.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -54,6 +70,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "utils/lsyscache.h"
@@ -62,6 +79,7 @@
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
 
+//#define PLV_TIME
 
 /*
  * Space/time tradeoff parameters: do these need to be user-tunable?
@@ -103,10 +121,81 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+/* DSM key for parallel lazy vacuum */
+#define VACUUM_KEY_PARALLEL_SCAN	UINT64CONST(0xFFFFFFFFFFF00001)
+#define VACUUM_KEY_VACUUM_STATS		UINT64CONST(0xFFFFFFFFFFF00002)
+#define VACUUM_KEY_INDEX_STATS	    UINT64CONST(0xFFFFFFFFFFF00003)
+#define VACUUM_KEY_DEAD_TUPLE_CTL	UINT64CONST(0xFFFFFFFFFFF00004)
+#define VACUUM_KEY_DEAD_TUPLES		UINT64CONST(0xFFFFFFFFFFF00005)
+#define VACUUM_KEY_PARALLEL_STATE	UINT64CONST(0xFFFFFFFFFFF00006)
+
+/*
+ * see note of lazy_scan_heap_get_nextpage about forcing scanning of
+ * last page
+ */
+#define FORCE_CHECK_PAGE(blk) \
+	(blkno == (blk - 1) && should_attempt_truncation(vacrelstats))
+
+/* Check if given index is assigned to this parallel vacuum worker */
+#define IsAssignedIndex(i, pstate) \
+	(pstate == NULL || \
+	 (((i) % ((LVParallelState *) (pstate))->nworkers -1 ) == ParallelWorkerNumber))
+
+#define IsDeadTupleShared(lvstate) \
+	((LVState *)(lvstate))->parallel_mode && \
+	((LVState *)(lvstate))->vacrelstats->nindexes > 0
+
+/* Vacuum worker state for parallel lazy vacuum */
+#define VACSTATE_SCAN			0x1	/* heap scan phase */
+#define VACSTATE_VACUUM			0x2	/* vacuuming on table and index */
+
+/*
+ * Vacuum relevant options and thresholds we need share with parallel
+ * vacuum workers.
+ */
+typedef struct VacuumInfo
+{
+	int				options;	/* VACUUM options */
+	bool			aggressive;	/* does each worker need to aggressive vacuum? */
+	TransactionId	oldestxmin;
+	TransactionId	freezelimit;
+	MultiXactId		multixactcutoff;
+	int				elevel;
+} VacuumInfo;
+
+/* Struct for index statistics that are used for parallel lazy vacuum */
+typedef struct LVIndStats
+{
+	bool		updated;	/* need to be updated? */
+	BlockNumber	num_pages;
+	BlockNumber	num_tuples;
+} LVIndStats;
+
+/* Struct for parallel lazy vacuum state */
+typedef struct LVParallelState
+{
+	int nworkers;			/* # of process doing vacuum */
+	VacuumInfo	info;
+	int	state;				/* current parallel vacuum status */
+	int	finish_count;
+	ConditionVariable cv;
+	slock_t	mutex;
+} LVParallelState;
+
+/* Struct for control dead tuple TIDs array */
+typedef struct LVDeadTupleCtl
+{
+	int			dt_max;	/* # slots allocated in array */
+	int 		dt_count; /* # of dead tuple */
+
+	/* Used only for parallel lazy vacuum */
+	int			dt_index;
+	slock_t 	mutex;
+} LVDeadTupleCtl;
+
 typedef struct LVRelStats
 {
-	/* hasindex = true means two-pass strategy; false means one-pass */
-	bool		hasindex;
+	int			nindexes; /* > 0 means two-pass strategy; = 0 means one-pass */
 	/* Overall statistics about rel */
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
@@ -118,19 +207,46 @@ typedef struct LVRelStats
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
 	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
 	double		tuples_deleted;
-	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
 	int			num_index_scans;
+	BlockNumber pages_removed;
+	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+/* Struct for lazy vacuum execution */
+typedef struct LVState
+{
+	bool		parallel_mode;
+	LVRelStats *vacrelstats;
+	/*
+	 * Used when both parallel and non-parallel lazy vacuum, but in parallel
+	 * lazy vacuum and table with index, dtctl points to a dynamic shared memory
+	 * and controlled by dtctl struct.
+	 */
+	LVDeadTupleCtl	*dtctl;
+	ItemPointer	deadtuples;
+
+	/* Used only for parallel lazy vacuum */
+	ParallelContext *pcxt;
+	LVParallelState *pstate;
+	ParallelHeapScanDesc pscan;
+	LVIndStats *indstats;
+} LVState;
+
+/*
+ * Scan description data for lazy vacuum. In parallel lazy vacuum,
+ * we use only heapscan instead.
+ */
+typedef struct LVScanDescData
+{
+	BlockNumber lv_cblock;					/* current scanning block number */
+	BlockNumber lv_next_unskippable_block;	/* next block number we cannot skip */
+	BlockNumber lv_nblocks;					/* the number blocks of relation */
+	HeapScanDesc heapscan;					/* field for parallel lazy vacuum */
+} LVScanDescData;
+typedef struct LVScanDescData *LVScanDesc;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -141,32 +257,47 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
-
-/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
-			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
-			   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+/* nonf-export function prototypes */
+static void lazy_vacuum_heap(Relation onerel, LVState *lvstate);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats);
-static void lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats);
+							  LVState *lvstate);
+static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
 						 LVRelStats *vacrelstats);
-static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
-static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr);
+static void lazy_space_alloc(LVState *lvstate, BlockNumber relblocks);
+static void lazy_record_dead_tuple(LVState *state, ItemPointer itemptr);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static void do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irels,
+							  int nindexes, int options, bool aggressive);
+static void lazy_scan_heap(Relation rel, LVState *lvstate, VacuumOptions options,
+						   bool aggressive);
+
+/* function prototypes for parallel vacuum */
+static void lazy_gather_vacuum_stats(ParallelContext *pxct,
+									 LVRelStats *valrelstats);
+static void lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats);
+static void lazy_initialize_dsm(ParallelContext *pcxt, Relation onrel,
+								LVState *lvstate, int options, bool aggressive);
+static LVState *lazy_initialize_worker(shm_toc *toc);
+static LVScanDesc lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan);
+static void lv_endscan(LVScanDesc lvscan);
+static BlockNumber lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+											   LVScanDesc lvscan,
+											   bool *all_visible_according_to_vm,
+											   Buffer *vmbuffer, int options, bool aggressive);
+static void lazy_prepare_vacuum(LVState *lvstate);
+static void lazy_end_vacuum(LVState *lvstate);
+static long lazy_get_max_dead_tuples(LVRelStats *vacrelstats);
 
 
 /*
@@ -179,12 +310,11 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf,
  *		and locked the relation.
  */
 void
-lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+lazy_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
 				BufferAccessStrategy bstrategy)
 {
-	LVRelStats *vacrelstats;
-	Relation   *Irel;
-	int			nindexes;
+	LVState		*lvstate;
+	LVRelStats	*vacrelstats;
 	PGRUsage	ru0;
 	TimestampTz starttime = 0;
 	long		secs;
@@ -211,7 +341,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		starttime = GetCurrentTimestamp();
 	}
 
-	if (options & VACOPT_VERBOSE)
+	if (options.flags & VACOPT_VERBOSE)
 		elevel = INFO;
 	else
 		elevel = DEBUG2;
@@ -239,10 +369,12 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 											   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 											  mxactFullScanLimit);
-	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+	if (options.flags & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
+	lvstate = (LVState *) palloc0(sizeof(LVState));
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
+	lvstate->vacrelstats = vacrelstats;
 
 	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
 	vacrelstats->old_rel_tuples = onerel->rd_rel->reltuples;
@@ -250,15 +382,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
-	/* Open all indexes of the relation */
-	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
-	vacrelstats->hasindex = (nindexes > 0);
-
 	/* Do the vacuuming */
-	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
-
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
+	lazy_scan_heap(onerel, lvstate, options, aggressive);
 
 	/*
 	 * Compute whether we actually scanned the all unfrozen pages. If we did,
@@ -267,7 +392,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	 * NB: We need to check this before truncating the relation, because that
 	 * will change ->rel_pages.
 	 */
-	if ((vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
+	if ((lvstate->vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
 		< vacrelstats->rel_pages)
 	{
 		Assert(!aggressive);
@@ -329,7 +454,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						new_rel_pages,
 						new_rel_tuples,
 						new_rel_allvisible,
-						vacrelstats->hasindex,
+						(vacrelstats->nindexes != 0),
 						new_frozen_xid,
 						new_min_multi,
 						false);
@@ -439,28 +564,166 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
- *	lazy_scan_heap() -- scan an open heap relation
+ * If the number of workers is specified more than 0, we enter the parallel lazy
+ * vacuum mode. In parallel lazy vacuum mode, we initialize a dynamic shared memory
+ * and launch parallel vacuum workers. The launcher process also vacuums the table
+ * after launched and then waits for the all vacuum workers to finish. After all vacuum
+ * workers finished we gather the vacuum statistics of table and indexes, and update
+ * them.
+ */
+static void
+lazy_scan_heap(Relation onerel, LVState *lvstate, VacuumOptions options,
+			   bool aggressive)
+{
+	ParallelContext	*pcxt;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+	Relation	*Irel;
+	int			nindexes;
+
+	lvstate->parallel_mode = options.nworkers > 0;
+
+	/* Open indexes */
+	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
+	vacrelstats->nindexes = nindexes;
+
+	if (lvstate->parallel_mode)
+	{
+		EnterParallelMode();
+
+		/* Create parallel context and initialize it */
+		pcxt = CreateParallelContext("postgres", "LazyVacuumWorkerMain",
+									 options.nworkers);
+		lvstate->pcxt = pcxt;
+
+		/* Estimate DSM size for parallel vacuum */
+		lazy_estimate_dsm(pcxt, lvstate->vacrelstats);
+
+		/* Initialize DSM for parallel vacuum */
+		InitializeParallelDSM(pcxt);
+		lazy_initialize_dsm(pcxt, onerel, lvstate, options.flags, aggressive);
+
+		/* Launch workers */
+		LaunchParallelWorkers(pcxt);
+	}
+
+	do_lazy_scan_heap(lvstate, onerel, Irel, nindexes, options.flags, aggressive);
+
+	/*
+	 * We can update relation statistics such as scanned page after gathered
+	 * statistics from all workers. Also, in parallel mode since we cannot update
+	 * index statistics at the same time the leader process have to do it.
+	 *
+	 * XXX : If we allows workers to update statistics tuples at the same time
+	 * the updating index statistics can be done in lazy_cleanup_index().
+	 */
+	if (lvstate->parallel_mode)
+	{
+		int i;
+		LVIndStats *indstats = palloc(sizeof(LVIndStats) * lvstate->vacrelstats->nindexes);
+
+		/* Wait for workers finished vacuum */
+		WaitForParallelWorkersToFinish(pcxt);
+
+		/* Gather the result of vacuum statistics from all workers */
+		lazy_gather_vacuum_stats(pcxt, vacrelstats);
+
+		/* Now we can compute the new value for pg_class.reltuples */
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 vacrelstats->rel_pages,
+															 vacrelstats->scanned_pages,
+															 vacrelstats->scanned_tuples);
+
+		/* Copy new index stats to local memory */
+		memcpy(indstats, lvstate->indstats, sizeof(LVIndStats) * vacrelstats->nindexes);
+
+		DestroyParallelContext(pcxt);
+		ExitParallelMode();
+
+		/* After exit parallel mode, update index statistics */
+		for (i = 0; i < vacrelstats->nindexes; i++)
+		{
+			Relation	ind = Irel[i];
+			LVIndStats *indstat = (LVIndStats *) &(indstats[i]);
+
+			if (indstat->updated)
+			   vac_update_relstats(ind,
+								   indstat->num_pages,
+								   indstat->num_tuples,
+								   0,
+								   false,
+								   InvalidTransactionId,
+								   InvalidMultiXactId,
+								   false);
+		}
+	}
+
+	vac_close_indexes(nindexes, Irel, RowExclusiveLock);
+}
+
+/*
+ * Entry point of parallel vacuum worker.
+ */
+void
+LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc)
+{
+	LVState		*lvstate;
+	Relation rel;
+	Relation *indrel;
+	int nindexes_worker;
+
+	/* Look up dynamic shared memory and initialize */
+	lvstate = lazy_initialize_worker(toc);
+
+	Assert(lvstate != NULL);
+
+	rel = relation_open(lvstate->pscan->phs_relid, ShareUpdateExclusiveLock);
+
+	/* Open all indexes */
+	vac_open_indexes(rel, RowExclusiveLock, &nindexes_worker,
+					 &indrel);
+
+	/* Do lazy vacuum */
+	do_lazy_scan_heap(lvstate, rel, indrel, lvstate->vacrelstats->nindexes,
+					  lvstate->pstate->info.options, lvstate->pstate->info.aggressive);
+
+	vac_close_indexes(lvstate->vacrelstats->nindexes, indrel, RowExclusiveLock);
+	heap_close(rel, ShareUpdateExclusiveLock);
+}
+
+/*
+ *	do_lazy_scan_heap() -- scan an open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
  *		things truncate dead tuples to dead line pointers, defragment the
- *		page, and set commit status bits (see heap_page_prune).  It also builds
+ *		page, and set commit status bits (see heap_page_prune).  It also uses
  *		lists of dead tuples and pages with free space, calculates statistics
  *		on the number of live tuples in the heap, and marks pages as
  *		all-visible if appropriate.  When done, or when we run low on space for
- *		dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
- *		to reclaim dead line pointers.
+ *		dead-tuple TIDs, invoke vacuuming of assigned indexes and call lazy_vacuum_heap
+ *		to reclaim dead line pointers. In parallel vacuum, we need to synchronize
+ *		at where scanning heap finished and vacuuming heap finished. The vacuum
+ *		worker reached to that point first need to wait for other vacuum workers
+ *		reached to the same point.
+ *
+ *		In parallel lazy scan, we get next page number using parallel heap scan.
+ *		Since the dead tuple TIDs are shared with all vacuum workers, we have to
+ *		wait for all other workers to reach to the same points where before starting
+ *		reclaiming dead tuple TIDs and before clearing dead tuple TIDs information
+ *		in dynamic shared memory.
  *
  *		If there are no indexes then we can reclaim line pointers on the fly;
  *		dead line pointers need only be retained until all index pointers that
  *		reference them have been killed.
  */
 static void
-lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive)
+do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irel,
+				  int nindexes, int options, bool aggressive)
 {
-	BlockNumber nblocks,
-				blkno;
+	LVRelStats *vacrelstats = lvstate->vacrelstats;
+	BlockNumber blkno;
+	BlockNumber nblocks;
 	HeapTupleData tuple;
+	LVScanDesc lvscan;
 	char	   *relname;
 	BlockNumber empty_pages,
 				vacuumed_pages;
@@ -471,11 +734,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	IndexBulkDeleteResult **indstats;
 	int			i;
 	PGRUsage	ru0;
+#ifdef PLV_TIME
+	PGRUsage	ru_scan;
+	PGRUsage	ru_vacuum;
+#endif
 	Buffer		vmbuffer = InvalidBuffer;
-	BlockNumber next_unskippable_block;
-	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
+	bool		all_visible_according_to_vm = false;
+
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -504,89 +771,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
-	lazy_space_alloc(vacrelstats, nblocks);
+	lazy_space_alloc(lvstate, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
+	/* Begin heap scan for vacuum */
+	lvscan = lv_beginscan(onerel, lvstate->pscan);
+
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = lvstate->dtctl->dt_max;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/*
-	 * Except when aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-	 * sequentially, the OS should be doing readahead for us, so there's no
-	 * gain in skipping a page now and then; that's likely to disable
-	 * readahead and so be counterproductive. Also, skipping even a single
-	 * page means that we can't update relfrozenxid, so we only want to do it
-	 * if we can skip a goodly number of pages.
-	 *
-	 * When aggressive is set, we can't skip pages just because they are
-	 * all-visible, but we can still skip pages that are all-frozen, since
-	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
-	 *
-	 * Before entering the main loop, establish the invariant that
-	 * next_unskippable_block is the next block number >= blkno that we can't
-	 * skip based on the visibility map, either all-visible for a regular scan
-	 * or all-frozen for an aggressive scan.  We set it to nblocks if there's
-	 * no such block.  We also set up the skipping_blocks flag correctly at
-	 * this stage.
-	 *
-	 * Note: The value returned by visibilitymap_get_status could be slightly
-	 * out-of-date, since we make this test before reading the corresponding
-	 * heap page or locking the buffer.  This is OK.  If we mistakenly think
-	 * that the page is all-visible or all-frozen when in fact the flag's just
-	 * been cleared, we might fail to vacuum the page.  It's easy to see that
-	 * skipping a page when aggressive is not set is not a very big deal; we
-	 * might leave some dead tuples lying around, but the next vacuum will
-	 * find them.  But even when aggressive *is* set, it's still OK if we miss
-	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
-	 * just added to that page are necessarily newer than the GlobalXmin we
-	 * computed, so they'll have no effect on the value to which we can safely
-	 * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
-	 *
-	 * We will scan the table's last page, at least to the extent of
-	 * determining whether it has tuples or not, even if it should be skipped
-	 * according to the above rules; except when we've already determined that
-	 * it's not worth trying to truncate the table.  This avoids having
-	 * lazy_truncate_heap() take access-exclusive lock on the table to attempt
-	 * a truncation that just fails immediately because there are tuples in
-	 * the last page.  This is worth avoiding mainly because such a lock must
-	 * be replayed on any hot standby, where it can be disruptive.
-	 */
-	next_unskippable_block = 0;
-	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-	{
-		while (next_unskippable_block < nblocks)
-		{
-			uint8		vmstatus;
-
-			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-												&vmbuffer);
-			if (aggressive)
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-					break;
-			}
-			else
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-					break;
-			}
-			vacuum_delay_point();
-			next_unskippable_block++;
-		}
-	}
-
-	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
-		skipping_blocks = true;
-	else
-		skipping_blocks = false;
-
-	for (blkno = 0; blkno < nblocks; blkno++)
+#ifdef PLV_TIME
+	pg_rusage_init(&ru_scan);
+#endif
+	while((blkno = lazy_scan_get_nextpage(onerel, lvstate, lvscan,
+										  &all_visible_according_to_vm,
+										  &vmbuffer, options, aggressive)) != InvalidBlockNumber)
 	{
 		Buffer		buf;
 		Page		page;
@@ -597,99 +799,31 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		int			prev_dead_count;
 		int			nfrozen;
 		Size		freespace;
-		bool		all_visible_according_to_vm = false;
 		bool		all_visible;
 		bool		all_frozen = true;	/* provided all_visible is also true */
 		bool		has_dead_tuples;
 		TransactionId visibility_cutoff_xid = InvalidTransactionId;
-
-		/* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
-		(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
+		int			dtmax;
+		int			dtcount;
 
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
-		if (blkno == next_unskippable_block)
-		{
-			/* Time to advance next_unskippable_block */
-			next_unskippable_block++;
-			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-			{
-				while (next_unskippable_block < nblocks)
-				{
-					uint8		vmskipflags;
-
-					vmskipflags = visibilitymap_get_status(onerel,
-														   next_unskippable_block,
-														   &vmbuffer);
-					if (aggressive)
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-							break;
-					}
-					else
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-							break;
-					}
-					vacuum_delay_point();
-					next_unskippable_block++;
-				}
-			}
-
-			/*
-			 * We know we can't skip the current block.  But set up
-			 * skipping_blocks to do the right thing at the following blocks.
-			 */
-			if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
-				skipping_blocks = true;
-			else
-				skipping_blocks = false;
-
-			/*
-			 * Normally, the fact that we can't skip this block must mean that
-			 * it's not all-visible.  But in an aggressive vacuum we know only
-			 * that it's not all-frozen, so it might still be all-visible.
-			 */
-			if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
-				all_visible_according_to_vm = true;
-		}
-		else
-		{
-			/*
-			 * The current block is potentially skippable; if we've seen a
-			 * long enough run of skippable blocks to justify skipping it, and
-			 * we're not forced to check it, then go ahead and skip.
-			 * Otherwise, the page must be at least all-visible if not
-			 * all-frozen, so we can set all_visible_according_to_vm = true.
-			 */
-			if (skipping_blocks && !FORCE_CHECK_PAGE())
-			{
-				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was all-frozen, so we have to recheck; but
-				 * in this case an approximate answer is OK.
-				 */
-				if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
-					vacrelstats->frozenskipped_pages++;
-				continue;
-			}
-			all_visible_according_to_vm = true;
-		}
-
 		vacuum_delay_point();
 
 		/*
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if (IsDeadTupleShared(lvstate))
+			SpinLockAcquire(&lvstate->dtctl->mutex);
+
+		dtmax = lvstate->dtctl->dt_max;
+		dtcount = lvstate->dtctl->dt_count;
+
+		if (IsDeadTupleShared(lvstate))
+			SpinLockRelease(&lvstate->dtctl->mutex);
+
+		if (((dtmax - dtcount) < MaxHeapTuplesPerPage) && dtcount > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -697,6 +831,19 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			};
 			int64		hvp_val[2];
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+			/*
+			 * Here we're about to vacuum the table and indexes actually. Before
+			 * entering vacuum state, we have to wait for other vacuum worker to
+			 * reach here.
+			 */
+			lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_vacuum);
+#endif
+
 			/*
 			 * Before beginning index vacuuming, we release any pin we may
 			 * hold on the visibility map page.  This isn't necessary for
@@ -716,11 +863,12 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
 
-			/* Remove index entries */
+			/* Remove assigned index entries */
 			for (i = 0; i < nindexes; i++)
-				lazy_vacuum_index(Irel[i],
-								  &indstats[i],
-								  vacrelstats);
+			{
+				if (IsAssignedIndex(i, lvstate->pstate))
+					lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+			}
 
 			/*
 			 * Report that we are now vacuuming the heap.  We also increase
@@ -733,19 +881,28 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
 
 			/* Remove tuples from heap */
-			lazy_vacuum_heap(onerel, vacrelstats);
+			lazy_vacuum_heap(onerel, lvstate);
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 			/*
-			 * Forget the now-vacuumed tuples, and press on, but be careful
-			 * not to reset latestRemovedXid since we want that value to be
-			 * valid.
+			 * Here we've done vacuum on the heap and index and we are going
+			 * to begin the next round scan on heap. Wait until all vacuum worker
+			 * finished vacuum. After all vacuum workers finished, forget the
+			 * now-vacuumed tuples, and press on, but be careful not to reset
+			 * latestRemoveXid since we want that value to be valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
-			vacrelstats->num_index_scans++;
+			lazy_end_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_scan);
+#endif
 
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			vacrelstats->num_index_scans++;
 		}
 
 		/*
@@ -771,7 +928,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * it's OK to skip vacuuming pages we get a lock conflict on. They
 			 * will be dealt with in some future vacuum.
 			 */
-			if (!aggressive && !FORCE_CHECK_PAGE())
+			if (!aggressive && !FORCE_CHECK_PAGE(blkno))
 			{
 				ReleaseBuffer(buf);
 				vacrelstats->pinskipped_pages++;
@@ -923,7 +1080,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = lvstate->dtctl->dt_count;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -962,7 +1119,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 */
 			if (ItemIdIsDead(itemid))
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				all_visible = false;
 				continue;
 			}
@@ -1067,7 +1224,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			if (tupgone)
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
 													   &vacrelstats->latestRemovedXid);
 				tups_vacuumed += 1;
@@ -1132,13 +1289,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/*
 		 * If there are no indexes then we can vacuum the page right now
-		 * instead of doing a second scan.
+		 * instead of doing a second scan. Because each parallel worker uses its
+		 * own dead tuple area they can vacuum independently.
 		 */
-		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+		if (Irel == NULL && lvstate->dtctl->dt_count > 0)
 		{
 			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+			lazy_vacuum_page(onerel, blkno, buf, 0, lvstate, &vmbuffer);
 			has_dead_tuples = false;
 
 			/*
@@ -1146,7 +1303,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lvstate->dtctl->dt_count = 0;
+
 			vacuumed_pages++;
 		}
 
@@ -1249,7 +1407,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (lvstate->dtctl->dt_count == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
@@ -1264,10 +1422,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_dead_tuples = nkeep;
 
 	/* now we can compute the new value for pg_class.reltuples */
-	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
-														 nblocks,
-														 vacrelstats->tupcount_pages,
-														 num_tuples);
+	if (!lvstate->parallel_mode)
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 nblocks,
+															 vacrelstats->tupcount_pages,
+															 num_tuples);
 
 	/*
 	 * Release any remaining pin on visibility map page.
@@ -1280,13 +1439,25 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (lvstate->dtctl->dt_count > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
 			PROGRESS_VACUUM_NUM_INDEX_VACUUMS
 		};
 		int64		hvp_val[2];
+#ifdef PLV_TIME
+		elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+		/*
+		 * Here we're about to vacuum the table and indexes actually. Before
+		 * entering vacuum state, we have to wait for other vacuum worker to
+		 * reach here.
+		 */
+		lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+		pg_rusage_init(&ru_vacuum);
+#endif
 
 		/* Log cleanup info before we touch indexes */
 		vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -1297,9 +1468,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/* Remove index entries */
 		for (i = 0; i < nindexes; i++)
-			lazy_vacuum_index(Irel[i],
-							  &indstats[i],
-							  vacrelstats);
+		{
+			if (IsAssignedIndex(i, lvstate->pstate))
+				lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+		}
 
 		/* Report that we are now vacuuming the heap */
 		hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
@@ -1309,8 +1481,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		/* Remove tuples from heap */
 		pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 									 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
-		lazy_vacuum_heap(onerel, vacrelstats);
+
+		lazy_vacuum_heap(onerel, lvstate);
+
 		vacrelstats->num_index_scans++;
+#ifdef PLV_TIME
+		elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 	}
 
 	/* report all blocks vacuumed; and that we're cleaning up */
@@ -1320,7 +1497,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
-		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+	{
+		if (IsAssignedIndex(i, lvstate->pstate))
+			lazy_cleanup_index(Irel[i], indstats[i], lvstate->vacrelstats,
+							   lvstate->parallel_mode ? &(lvstate->indstats[i]) : NULL);
+	}
 
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
@@ -1329,12 +1510,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	lv_endscan(lvscan);
+
 	/*
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
 	 */
 	initStringInfo(&buf);
 	appendStringInfo(&buf,
+					 "------- worker %d TOTAL stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
 					 _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
 					 nkeep, OldestXmin);
 	appendStringInfo(&buf, _("There were %.0f unused item pointers.\n"),
@@ -1362,6 +1547,35 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pfree(buf.data);
 }
 
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+lazy_gather_vacuum_stats(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	int	i;
+	LVRelStats *lvstats_list;
+
+	lvstats_list = (LVRelStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_VACUUM_STATS, false);
+
+	/* Gather each worker stats */
+	for (i = 0; i < pcxt->nworkers_launched; i++)
+	{
+		LVRelStats *wstats = lvstats_list + sizeof(LVRelStats) * i;
+
+		vacrelstats->scanned_pages += wstats->scanned_pages;
+		vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+		vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+		vacrelstats->scanned_tuples += wstats->scanned_tuples;
+		vacrelstats->new_dead_tuples += wstats->new_dead_tuples;
+		vacrelstats->pages_removed += wstats->pages_removed;
+		vacrelstats->tuples_deleted += wstats->tuples_deleted;
+		vacrelstats->nonempty_pages += wstats->nonempty_pages;
+	}
+
+	/* all vacuum workers have same value of rel_pages */
+	vacrelstats->rel_pages = lvstats_list->rel_pages;
+}
 
 /*
  *	lazy_vacuum_heap() -- second pass over the heap
@@ -1375,18 +1589,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
  * process index entry removal in batches as large as possible.
  */
 static void
-lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
+lazy_vacuum_heap(Relation onerel, LVState *lvstate)
 {
 	int			tupindex;
 	int			npages;
 	PGRUsage	ru0;
+	BlockNumber	prev_tblk;
 	Buffer		vmbuffer = InvalidBuffer;
+	ItemPointer	deadtuples = lvstate->deadtuples;
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+	BlockNumber	ntuples = 0;
+	StringInfoData	buf;
 
 	pg_rusage_init(&ru0);
 	npages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+
+	while (tupindex < dtctl->dt_count)
 	{
 		BlockNumber tblk;
 		Buffer		buf;
@@ -1395,7 +1615,40 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 
 		vacuum_delay_point();
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		/*
+		 * If the dead tuple TIDs are shared with all vacuum workers,
+		 * we acquire the lock and advance tupindex before vacuuming.
+		 *
+		 * NB: The number of maximum tuple can be stored into single
+		 * page is not a large number in most cases. We can use spinlock
+		 * here.
+		 */
+		if (IsDeadTupleShared(lvstate))
+		{
+			SpinLockAcquire(&(dtctl->mutex));
+
+			tupindex = dtctl->dt_index;
+
+			if (tupindex >= dtctl->dt_count)
+			{
+				SpinLockRelease(&(dtctl->mutex));
+				break;
+			}
+
+			/* Advance dtct->dt_index */
+			prev_tblk = tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
+			while(prev_tblk == tblk &&
+				  dtctl->dt_index < dtctl->dt_count)
+			{
+				tblk = ItemPointerGetBlockNumber(&deadtuples[dtctl->dt_index]);
+				dtctl->dt_index++;
+				ntuples++;
+			}
+
+			SpinLockRelease(&(dtctl->mutex));
+		}
+
+		tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
 		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
 								 vac_strategy);
 		if (!ConditionalLockBufferForCleanup(buf))
@@ -1404,7 +1657,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 			++tupindex;
 			continue;
 		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, lvstate,
 									&vmbuffer);
 
 		/* Now that we've compacted the page, record its available space */
@@ -1422,10 +1675,17 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 		vmbuffer = InvalidBuffer;
 	}
 
+#ifdef PLV_TIME
+	elog(WARNING, "%d TABLE %s", ParallelWorkerNumber, pg_rusage_show(&ru0));
+#endif
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "------- worker %d VACUUM HEAP stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
+					 "\"%s\": removed %d row versions in %d pages",
+					 RelationGetRelationName(onerel), ntuples, npages);
 	ereport(elevel,
-			(errmsg("\"%s\": removed %d row versions in %d pages",
-					RelationGetRelationName(onerel),
-					tupindex, npages),
+			(errmsg("%s", buf.data),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1435,34 +1695,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
- * tuple for this page.  We assume the rest follow sequentially.
- * The return value is the first tupindex after the tuples of this page.
  */
 static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < lvstate->dtctl->dt_count; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&lvstate->deadtuples[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&lvstate->deadtuples[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1587,14 +1845,15 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
  *	lazy_vacuum_index() -- vacuum one index relation.
  *
  *		Delete all the index entries pointing to tuples listed in
- *		vacrelstats->dead_tuples, and update running statistics.
+ *		lvstate->deadtuples, and update running statistics.
  */
 static void
 lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats)
+				  LVState	*lvstate)
 {
 	IndexVacuumInfo ivinfo;
+	StringInfoData buf;
 	PGRUsage	ru0;
 
 	pg_rusage_init(&ru0);
@@ -1603,17 +1862,25 @@ lazy_vacuum_index(Relation indrel,
 	ivinfo.analyze_only = false;
 	ivinfo.estimated_count = true;
 	ivinfo.message_level = elevel;
-	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
+	ivinfo.num_heap_tuples = lvstate->vacrelstats->old_rel_tuples;
 	ivinfo.strategy = vac_strategy;
 
 	/* Do bulk deletion */
-	*stats = index_bulk_delete(&ivinfo, *stats,
-							   lazy_tid_reaped, (void *) vacrelstats);
+	*stats = index_bulk_delete(&ivinfo, *stats, lazy_tid_reaped, (void *) lvstate);
+
+#ifdef PLV_TIME
+	elog(WARNING, "%d INDEX(%d) %s", ParallelWorkerNumber, RelationGetRelid(indrel),
+		 pg_rusage_show(&ru0));
+#endif
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "------- worker %d VACUUM INDEX stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
+					 "scanned index \"%s\" to remove %d row versions",
+					 RelationGetRelationName(indrel), lvstate->dtctl->dt_count);
 
 	ereport(elevel,
-			(errmsg("scanned index \"%s\" to remove %d row versions",
-					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+			(errmsg("%s", buf.data),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1621,11 +1888,11 @@ lazy_vacuum_index(Relation indrel,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  */
 static void
-lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats)
+lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat)
 {
 	IndexVacuumInfo ivinfo;
+	StringInfoData	buf;
 	PGRUsage	ru0;
 
 	pg_rusage_init(&ru0);
@@ -1639,34 +1906,54 @@ lazy_cleanup_index(Relation indrel,
 
 	stats = index_vacuum_cleanup(&ivinfo, stats);
 
+	/* Will be updated by leader process after vacuumed */
+	if (indstat)
+		indstat->updated = false;
+
 	if (!stats)
 		return;
 
 	/*
 	 * Now update statistics in pg_class, but only if the index says the count
-	 * is accurate.
+	 * is accurate. In parallel lazy vacuum, the worker can not update these
+	 * information by itself, so save to DSM and then the launcher process
+	 * updates it later.
 	 */
 	if (!stats->estimated_count)
-		vac_update_relstats(indrel,
-							stats->num_pages,
-							stats->num_index_tuples,
-							0,
-							false,
-							InvalidTransactionId,
-							InvalidMultiXactId,
-							false);
+	{
+		if (indstat)
+		{
+			indstat->updated = true;
+			indstat->num_pages = stats->num_pages;
+			indstat->num_tuples = stats->num_index_tuples;
+		}
+		else
+			vac_update_relstats(indrel,
+								stats->num_pages,
+								stats->num_index_tuples,
+								0,
+								false,
+								InvalidTransactionId,
+								InvalidMultiXactId,
+								false);
+	}
 
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "------- worker %d CLEANUP INDEX stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
+					 "index \"%s\" now contains %.0f row versions in %u pages",
+					 RelationGetRelationName(indrel),
+					 stats->num_index_tuples,
+					 stats->num_pages);
 	ereport(elevel,
-			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
-					RelationGetRelationName(indrel),
-					stats->num_index_tuples,
-					stats->num_pages),
-			 errdetail("%.0f index row versions were removed.\n"
-					   "%u index pages have been deleted, %u are currently reusable.\n"
-					   "%s.",
-					   stats->tuples_removed,
-					   stats->pages_deleted, stats->pages_free,
-					   pg_rusage_show(&ru0))));
+			(errmsg("%s", buf.data),
+					errdetail("%.0f index row versions were removed.\n"
+							  "%u index pages have been deleted, %u are currently reusable.\n"
+							  "%s.",
+							  stats->tuples_removed,
+							  stats->pages_deleted, stats->pages_free,
+							  pg_rusage_show(&ru0))));
 
 	pfree(stats);
 }
@@ -1976,59 +2263,66 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 /*
  * lazy_space_alloc - space allocation decisions for lazy vacuum
  *
+ * In parallel lazy vacuum the space for dead tuple locations are already
+ * allocated in dynamic shared memory, so we allocate space for dead tuple
+ * locations in local memory only when in not parallel lazy vacuum and set
+ * MyDeadTuple.
+ *
  * See the comments at the head of this file for rationale.
  */
 static void
-lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
+lazy_space_alloc(LVState *lvstate, BlockNumber relblocks)
 {
-	long		maxtuples;
-	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
-	autovacuum_work_mem != -1 ?
-	autovacuum_work_mem : maintenance_work_mem;
+	long maxtuples;
 
-	if (vacrelstats->hasindex)
-	{
-		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
-		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+	/*
+	 * In parallel mode, we already set the pointer to dead tuple
+	 * array when initialize.
+	 */
+	if (lvstate->parallel_mode && lvstate->vacrelstats->nindexes > 0)
+		return;
 
-		/* curious coding here to ensure the multiplication can't overflow */
-		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
-			maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+	maxtuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
 
-		/* stay sane if small maintenance_work_mem */
-		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
-	}
-	else
-	{
-		maxtuples = MaxHeapTuplesPerPage;
-	}
-
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	/*
+	 * If in not parallel lazy vacuum, we need to allocate dead
+	 * tuple array in local memory.
+	 */
+	lvstate->deadtuples = palloc0(sizeof(ItemPointerData) * (int)maxtuples);
+	lvstate->dtctl = (LVDeadTupleCtl *) palloc(sizeof(LVDeadTupleCtl));
+	lvstate->dtctl->dt_max = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+	lvstate->dtctl->dt_count = 0;
 }
 
 /*
  * lazy_record_dead_tuple - remember one deletable tuple
+ *
+ * Acquiring the spinlock before remember is required if the dead tuple
+ * TIDs are shared with other vacuum workers.
  */
 static void
-lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr)
+lazy_record_dead_tuple(LVState *lvstate, ItemPointer itemptr)
 {
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockAcquire(&(dtctl->mutex));
+
 	/*
 	 * The array shouldn't overflow under normal behavior, but perhaps it
 	 * could if we are given a really small maintenance_work_mem. In that
 	 * case, just forget the last few tuples (we'll get 'em next time).
 	 */
-	if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
+	if (dtctl->dt_count < dtctl->dt_max)
 	{
-		vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-		vacrelstats->num_dead_tuples++;
-		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-									 vacrelstats->num_dead_tuples);
+
+		lvstate->deadtuples[dtctl->dt_count] = *itemptr;
+		(dtctl->dt_count)++;
+		/* XXX : Update progress information here */
 	}
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockRelease(&(dtctl->mutex));
 }
 
 /*
@@ -2041,16 +2335,23 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
 static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
-	LVRelStats *vacrelstats = (LVRelStats *) state;
+	LVState *lvstate = (LVState *) state;
 	ItemPointer res;
 
+	/*
+	 * We can assume that the dead tuple TIDs are sorted by TID location
+	 * even when we shared the dead tuple TIDs with other vacuum workers.
+	 */
 	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
+								(void *) lvstate->deadtuples,
+								lvstate->dtctl->dt_count,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
 
-	return (res != NULL);
+	if (res != NULL)
+		return true;
+
+	return false;
 }
 
 /*
@@ -2194,3 +2495,622 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number we need to scan next, or InvalidBlockNumber if scan
+ * is done.
+ *
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
+ * sequentially, the OS should be doing readahead for us, so there's no
+ * gain in skipping a page now and then; that's likely to disable
+ * readahead and so be counterproductive. Also, skipping even a single
+ * page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * Before entering the main loop, establish the invariant that
+ * next_unskippable_block is the next block number >= blkno that we can't
+ * skip based on the visibility map, either all-visible for a regular scan
+ * or all-frozen for an aggressive scan.  We set it to nblocks if there's
+ * no such block.  We also set up the skipping_blocks flag correctly at
+ * this stage.
+ *
+ * In not parallel mode, before entering the main loop, establish the
+ * invariant that next_unskippable_block is the next block number >= blkno
+ * that's not we can't skip based on the visibility map, either all-visible
+ * for a regular scan or all-frozen for an aggressive scan.  We set it to
+ * nblocks if there's no such block.  We also set up the skipping_blocks
+ * flag correctly at this stage.
+ *
+ * In parallel mode, pstate is not NULL. We scan heap pages
+ * using parallel heap scan description. Each worker calls heap_parallelscan_nextpage()
+ * in order to exclusively get  block number we need to scan at next.
+ * If given block is all-visible according to visibility map, we skip to
+ * scan this block immediately unlike not parallel lazy scan.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer.  This is OK.  If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page.  It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them.  But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared.  Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * Computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table.  This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page.  This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static BlockNumber
+lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+					   LVScanDesc lvscan, bool *all_visible_according_to_vm,
+					   Buffer *vmbuffer, int options, bool aggressive)
+{
+	BlockNumber blkno;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+
+	if (lvstate->parallel_mode)
+	{
+		/*
+		 * In parallel lazy vacuum since it's hard to know how many consecutive
+		 * all-visible pages exits on table we skip to scan the heap page immediately.
+		 * if it is all-visible page.
+		 */
+		while ((blkno = heap_parallelscan_nextpage(lvscan->heapscan)) != InvalidBlockNumber)
+		{
+			*all_visible_according_to_vm = false;
+			vacuum_delay_point();
+
+			/* Consider to skip scan page according visibility map */
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0 &&
+				!FORCE_CHECK_PAGE(blkno))
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, blkno, vmbuffer);
+
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+					{
+						vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+					else if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+						*all_visible_according_to_vm = true;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+					{
+						if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+							vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+				}
+			}
+
+			/* We need to scan current blkno, break */
+			break;
+		}
+	}
+	else
+	{
+		bool skipping_blocks = false;
+
+		/* Initialize lv_nextunskippable_page if needed */
+		if (lvscan->lv_cblock == 0 && (options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+		{
+			while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, lvscan->lv_next_unskippable_block,
+													vmbuffer);
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+						break;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+						break;
+				}
+				vacuum_delay_point();
+				lvscan->lv_next_unskippable_block++;
+			}
+
+			if (lvscan->lv_next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+				skipping_blocks = true;
+			else
+				skipping_blocks = false;
+		}
+
+		/* Decide the block number we need to scan */
+		for (blkno = lvscan->lv_cblock; blkno < lvscan->lv_nblocks; blkno++)
+		{
+			if (blkno == lvscan->lv_next_unskippable_block)
+			{
+				/* Time to advance next_unskippable_block */
+				lvscan->lv_next_unskippable_block++;
+				if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+				{
+					while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+					{
+						uint8		vmstatus;
+
+						vmstatus = visibilitymap_get_status(onerel,
+															lvscan->lv_next_unskippable_block,
+															vmbuffer);
+						if (aggressive)
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+								break;
+						}
+						else
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+								break;
+						}
+						vacuum_delay_point();
+						lvscan->lv_next_unskippable_block++;
+					}
+				}
+
+				/*
+				 * We know we can't skip the current block.  But set up
+				 * skipping_all_visible_blocks to do the right thing at the
+				 * following blocks.
+				 */
+				if (lvscan->lv_next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
+					skipping_blocks = true;
+				else
+					skipping_blocks = false;
+
+				/*
+				 * Normally, the fact that we can't skip this block must mean that
+				 * it's not all-visible.  But in an aggressive vacuum we know only
+				 * that it's not all-frozen, so it might still be all-visible.
+				 */
+				if (aggressive && VM_ALL_VISIBLE(onerel, blkno, vmbuffer))
+					*all_visible_according_to_vm = true;
+
+				/* Found out that next unskippable block number */
+				break;
+			}
+			else
+			{
+				/*
+				 * The current block is potentially skippable; if we've seen a
+				 * long enough run of skippable blocks to justify skipping it, and
+				 * we're not forced to check it, then go ahead and skip.
+				 * Otherwise, the page must be at least all-visible if not
+				 * all-frozen, so we can set all_visible_according_to_vm = true.
+				 */
+				if (skipping_blocks && !FORCE_CHECK_PAGE(blkno))
+				{
+					/*
+					 * Tricky, tricky.  If this is in aggressive vacuum, the page
+					 * must have been all-frozen at the time we checked whether it
+					 * was skippable, but it might not be any more.  We must be
+					 * careful to count it as a skipped all-frozen page in that
+					 * case, or else we'll think we can't update relfrozenxid and
+					 * relminmxid.  If it's not an aggressive vacuum, we don't
+					 * know whether it was all-frozen, so we have to recheck; but
+					 * in this case an approximate answer is OK.
+					 */
+					if (aggressive || VM_ALL_FROZEN(onerel, blkno, vmbuffer))
+						vacrelstats->frozenskipped_pages++;
+					continue;
+				}
+
+				*all_visible_according_to_vm = true;
+
+				/* We need to scan current blkno, break */
+				break;
+			}
+		} /* for */
+
+		/* Advance the current block number for the next scan */
+		lvscan->lv_cblock = blkno + 1;
+	}
+
+	return (blkno == lvscan->lv_nblocks) ? InvalidBlockNumber : blkno;
+}
+
+/*
+ * Begin lazy vacuum scan. lvscan->heapscan is NULL if
+ * we're not in parallel lazy vacuum.
+ */
+static LVScanDesc
+lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan)
+{
+	LVScanDesc lvscan;
+
+	lvscan = (LVScanDesc) palloc(sizeof(LVScanDescData));
+
+	lvscan->lv_cblock = 0;
+	lvscan->lv_next_unskippable_block = 0;
+	lvscan->lv_nblocks = RelationGetNumberOfBlocks(onerel);
+
+	if (pscan != NULL)
+		lvscan->heapscan = heap_beginscan_parallel(onerel, pscan);
+	else
+		lvscan->heapscan = NULL;
+
+	return lvscan;
+}
+
+/*
+ * End lazy vacuum scan.
+ */
+static void
+lv_endscan(LVScanDesc lvscan)
+{
+	if (lvscan->heapscan != NULL)
+		heap_endscan(lvscan->heapscan);
+	pfree(lvscan);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Lazy Vacuum Support
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * Estimate storage for parallel lazy vacuum.
+ */
+static void
+lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	Size size = 0;
+	int keys = 0;
+	int vacuum_workers = pcxt->nworkers + 1;
+	long maxtuples = lazy_get_max_dead_tuples(vacrelstats);
+
+	/* Estimate size for parallel heap scan */
+	size += heap_parallelscan_estimate(SnapshotAny);
+	keys++;
+
+	/* Estimate size for vacuum statistics for only workers*/
+	size += BUFFERALIGN(mul_size(sizeof(LVRelStats), pcxt->nworkers));
+	keys++;
+
+	/* We have to share dead tuple information only when the table has indexes */
+	if (vacrelstats->nindexes > 0)
+	{
+		/* Estimate size for index statistics */
+		size += BUFFERALIGN(mul_size(sizeof(LVIndStats), vacrelstats->nindexes));
+		keys++;
+
+		/* Estimate size for dead tuple control */
+		size += BUFFERALIGN(sizeof(LVDeadTupleCtl));
+		keys++;
+
+		/* Estimate size for dead tuple array */
+		size += BUFFERALIGN(mul_size(
+							 mul_size(sizeof(ItemPointerData), maxtuples),
+							 vacuum_workers));
+		keys++;
+	}
+
+	/* Estimate size for parallel lazy vacuum state */
+	size += BUFFERALIGN(sizeof(LVParallelState));
+	keys++;
+
+	/* Estimate size for vacuum task */
+	size += BUFFERALIGN(sizeof(VacuumInfo));
+	keys++;
+
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, keys);
+}
+
+/*
+ * Initialize dynamic shared memory for parallel lazy vacuum. We store
+ * relevant informations of parallel heap scanning, dead tuple array
+ * and vacuum statistics for each worker and some parameters for lazy vacuum.
+ */
+static void
+lazy_initialize_dsm(ParallelContext *pcxt, Relation onerel, LVState *lvstate,
+					int options, bool aggressive)
+{
+	ParallelHeapScanDesc pscan_ptr;
+	ItemPointer	deadtuples_ptr;
+	char 		*lvrelstats_ptr;
+	LVParallelState *pstate_ptr;
+	LVIndStats	*indstats_ptr;
+	LVDeadTupleCtl	*dtctl_ptr;
+	int i;
+	int deadtuples_size;
+	int lvrelstats_size;
+	int	vacuum_workers = pcxt->nworkers + 1;
+	long max_tuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+
+	/* Allocate and initialize DSM for vacuum stats for each worker */
+	lvrelstats_size = mul_size(sizeof(LVRelStats), pcxt->nworkers);
+	lvrelstats_ptr = shm_toc_allocate(pcxt->toc, lvrelstats_size);
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		char *start;
+
+		start = lvrelstats_ptr + i * sizeof(LVRelStats);
+		memcpy(start, lvstate->vacrelstats, sizeof(LVRelStats));
+	}
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_STATS, lvrelstats_ptr);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Allocate and initialize DSM for dead tuple control */
+		dtctl_ptr = (LVDeadTupleCtl *) shm_toc_allocate(pcxt->toc, sizeof(LVDeadTupleCtl));
+		SpinLockInit(&(dtctl_ptr->mutex));
+		dtctl_ptr->dt_max = max_tuples * vacuum_workers;
+		dtctl_ptr->dt_count = 0;
+		dtctl_ptr->dt_index = 0;
+		lvstate->dtctl = dtctl_ptr;
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLE_CTL, dtctl_ptr);
+
+		/* Allocate and initialize DSM for dead tuple array */
+		deadtuples_size = mul_size(mul_size(sizeof(ItemPointerData), max_tuples),
+								   vacuum_workers);
+		deadtuples_ptr = (ItemPointer) shm_toc_allocate(pcxt->toc,
+														deadtuples_size);
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLES, deadtuples_ptr);
+		lvstate->deadtuples = deadtuples_ptr;
+
+		/* Allocate DSM for index statistics */
+		indstats_ptr = (LVIndStats *) shm_toc_allocate(pcxt->toc,
+													   mul_size(sizeof(LVIndStats),
+																lvstate->vacrelstats->nindexes));
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_INDEX_STATS, indstats_ptr);
+		lvstate->indstats = indstats_ptr;
+	}
+
+	/* Allocate and initialize DSM for parallel scan description */
+	pscan_ptr = (ParallelHeapScanDesc) shm_toc_allocate(pcxt->toc,
+														heap_parallelscan_estimate(SnapshotAny));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_SCAN, pscan_ptr);
+	heap_parallelscan_initialize(pscan_ptr, onerel, SnapshotAny);
+	lvstate->pscan = pscan_ptr;
+
+	/* Allocate and initialize DSM for parallel vacuum state */
+	pstate_ptr = (LVParallelState *) shm_toc_allocate(pcxt->toc, sizeof(LVParallelState));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_STATE, pstate_ptr);
+
+	ConditionVariableInit(&(pstate_ptr->cv));
+	SpinLockInit(&(pstate_ptr->mutex));
+	pstate_ptr->nworkers = vacuum_workers;
+	pstate_ptr->state = VACSTATE_SCAN;
+	pstate_ptr->info.aggressive = aggressive;
+	pstate_ptr->info.options = options;
+	pstate_ptr->info.oldestxmin = OldestXmin;
+	pstate_ptr->info.freezelimit = FreezeLimit;
+	pstate_ptr->info.multixactcutoff = MultiXactCutoff;
+	pstate_ptr->info.elevel = elevel;
+	lvstate->pstate = pstate_ptr;
+}
+
+/*
+ * Initialize parallel lazy vacuum for worker.
+ */
+static LVState *
+lazy_initialize_worker(shm_toc *toc)
+{
+	LVState	*lvstate;
+	char *lvstats;
+
+	lvstate = (LVState *) palloc(sizeof(LVState));
+	lvstate->parallel_mode = true;
+
+	/* Set up vacuum stats */
+	lvstats = shm_toc_lookup(toc, VACUUM_KEY_VACUUM_STATS, false);
+	lvstate->vacrelstats = (LVRelStats *) (lvstats +
+										   sizeof(LVRelStats) * ParallelWorkerNumber);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Set up dead tuple control */
+		lvstate->dtctl = (LVDeadTupleCtl *) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLE_CTL, false);
+
+		/* Set up dead tuple array */
+		lvstate->deadtuples = (ItemPointer) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLES, false);
+
+		/* Set up index statistics */
+		lvstate->indstats = (LVIndStats *) shm_toc_lookup(toc, VACUUM_KEY_INDEX_STATS, false);
+	}
+
+	/* Set up parallel vacuum state */
+	lvstate->pstate = (LVParallelState *) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_STATE, false);
+
+	/* Set up parallel heap scan description */
+	lvstate->pscan = (ParallelHeapScanDesc) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_SCAN, false);
+
+	/* Set up parameters for lazy vacuum */
+	OldestXmin = lvstate->pstate->info.oldestxmin;
+	FreezeLimit = lvstate->pstate->info.freezelimit;
+	MultiXactCutoff = lvstate->pstate->info.multixactcutoff;
+	elevel = lvstate->pstate->info.elevel;
+
+	return lvstate;
+}
+
+/*
+ * In the end of actual vacuumming on table and indexes actually, we have
+ * to wait for other all vacuum workers to reach here before clearing dead
+ * tuple TIDs information.
+ */
+static void
+lazy_end_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+	{
+		lvstate->dtctl->dt_count = 0;
+		return;
+	}
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		/* Fetch shared information */
+		if (!counted)
+			pstate->finish_count++;
+		finish_count = pstate->finish_count;
+		state = pstate->state;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_SCAN)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * If all launched parallel vacuum workers reached here, we can clear the
+		 * dead tuple TIDs information.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			/* Clear dead tuples */
+			lvstate->dtctl->dt_count = 0;
+
+			/* need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_SCAN;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_DONE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Before starting actual vacuuming on table and indexes, we have to wait for
+ * other all vacuum workers so that all worker can see the same dead tuple TIDs
+ * information when vacuuming.
+ */
+static void
+lazy_prepare_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+		return;
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		if (!counted)
+			pstate->finish_count++;
+		state = pstate->state;
+		finish_count = pstate->finish_count;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_VACUUM)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * The leader process can change parallel vacuum state if all workers
+		 * have reached here.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			qsort((void *) lvstate->deadtuples, lvstate->dtctl->dt_count,
+				  sizeof(ItemPointerData), vac_cmp_itemptr);
+
+			/* XXX: need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_VACUUM;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_PREPARE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Return the number of maximum dead tuples can be stored according
+ * to vac_work_mem.
+ */
+static long
+lazy_get_max_dead_tuples(LVRelStats *vacrelstats)
+{
+	long maxtuples;
+	int	vac_work_mem = IsAutoVacuumWorkerProcess() &&
+		autovacuum_work_mem != -1 ?
+		autovacuum_work_mem : maintenance_work_mem;
+
+	if (vacrelstats->nindexes != 0)
+	{
+		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+		maxtuples = Min(maxtuples, INT_MAX);
+		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+
+		/* curious coding here to ensure the multiplication can't overflow */
+		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > vacrelstats->old_rel_pages)
+			maxtuples = vacrelstats->old_rel_pages * LAZY_ALLOC_TUPLES;
+
+		/* stay sane if small maintenance_work_mem */
+		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+	}
+	else
+	{
+		maxtuples = MaxHeapTuplesPerPage;
+	}
+
+	return maxtuples;
+}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 11731da..6b1bd2e 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1663,7 +1663,12 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
 static bool
 _equalVacuumStmt(const VacuumStmt *a, const VacuumStmt *b)
 {
-	COMPARE_SCALAR_FIELD(options);
+	if (a->options.flags != b->options.flags)
+		return false;
+
+	if (a->options.nworkers != b->options.nworkers)
+		return false;
+
 	COMPARE_NODE_FIELD(relation);
 	COMPARE_NODE_FIELD(va_cols);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 5eb3981..b97066b 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -187,6 +187,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 			   bool *deferrable, bool *initdeferred, bool *not_valid,
 			   bool *no_inherit, core_yyscan_t yyscanner);
 static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+static VacuumOptions *makeVacOpt(VacuumOption flag, int nworkers);
 
 %}
 
@@ -237,6 +238,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	struct ImportQual	*importqual;
 	InsertStmt			*istmt;
 	VariableSetStmt		*vsetstmt;
+	VacuumOptions		*vacopts;
 	PartitionElem		*partelem;
 	PartitionSpec		*partspec;
 	PartitionBoundSpec	*partboundspec;
@@ -305,7 +307,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_extension_opt_item alter_extension_opt_item
 
 %type <ival>	opt_lock lock_type cast_context
-%type <ival>	vacuum_option_list vacuum_option_elem
+%type <vacopts>	vacuum_option_list vacuum_option_elem
 %type <boolean>	opt_or_replace
 				opt_grant_grant_option opt_grant_admin_option
 				opt_nowait opt_if_exists opt_with_data
@@ -10137,47 +10139,59 @@ cluster_index_specification:
 VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose qualified_name
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = $5;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose AnalyzeStmt
 				{
 					VacuumStmt *n = (VacuumStmt *) $5;
-					n->options |= VACOPT_VACUUM;
+					n->options.flags |= VACOPT_VACUUM;
 					if ($2)
-						n->options |= VACOPT_FULL;
+						n->options.flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						n->options.flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						n->options.flags |= VACOPT_VERBOSE;
+					n->options.nworkers = 0;
 					$$ = (Node *)n;
 				}
 			| VACUUM '(' vacuum_option_list ')'
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *) n;
@@ -10185,29 +10199,52 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
 			| VACUUM '(' vacuum_option_list ')' qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions *vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->relation = $5;
 					n->va_cols = $6;
 					if (n->va_cols != NIL)	/* implies analyze */
-						n->options |= VACOPT_ANALYZE;
+						n->options.flags |= VACOPT_ANALYZE;
 					$$ = (Node *) n;
 				}
 		;
 
 vacuum_option_list:
 			vacuum_option_elem								{ $$ = $1; }
-			| vacuum_option_list ',' vacuum_option_elem		{ $$ = $1 | $3; }
+			| vacuum_option_list ',' vacuum_option_elem
+			{
+				VacuumOptions *vacopts1 = (VacuumOptions *)$1;
+				VacuumOptions *vacopts2 = (VacuumOptions *)$3;
+
+				vacopts1->flags |= vacopts2->flags;
+				if (vacopts2->flags == VACOPT_PARALLEL)
+					vacopts1->nworkers = vacopts2->nworkers;
+
+				$$ = vacopts1;
+				pfree(vacopts2);
+			}
 		;
 
 vacuum_option_elem:
-			analyze_keyword		{ $$ = VACOPT_ANALYZE; }
-			| VERBOSE			{ $$ = VACOPT_VERBOSE; }
-			| FREEZE			{ $$ = VACOPT_FREEZE; }
-			| FULL				{ $$ = VACOPT_FULL; }
+			analyze_keyword		{ $$ = makeVacOpt(VACOPT_ANALYZE, 0); }
+			| VERBOSE			{ $$ = makeVacOpt(VACOPT_VERBOSE, 0); }
+			| FREEZE			{ $$ = makeVacOpt(VACOPT_FREEZE, 0); }
+			| FULL				{ $$ = makeVacOpt(VACOPT_FULL, 0); }
+			| PARALLEL ICONST
+				{
+					if ($2 < 1)
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+								 errmsg("parallel vacuum degree must be more than 1"),
+								 parser_errposition(@1)));
+					$$ = makeVacOpt(VACOPT_PARALLEL, $2);
+				}
 			| IDENT
 				{
 					if (strcmp($1, "disable_page_skipping") == 0)
-						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+						$$ = makeVacOpt(VACOPT_DISABLE_PAGE_SKIPPING, 1);
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
@@ -10215,27 +10252,36 @@ vacuum_option_elem:
 									 parser_errposition(@1)));
 				}
 		;
-
 AnalyzeStmt:
 			analyze_keyword opt_verbose
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = NULL;
 					n->va_cols = NIL;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| analyze_keyword opt_verbose qualified_name opt_name_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->relation = $3;
 					n->va_cols = $4;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 		;
 
@@ -15919,6 +15965,16 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
 	return (Node *) s;
 }
 
+static VacuumOptions *
+makeVacOpt(VacuumOption flag, int nworkers)
+{
+	VacuumOptions *vacopt = palloc(sizeof(VacuumOptions));
+
+	vacopt->flags = flag;
+	vacopt->nworkers = nworkers;
+	return vacopt;
+}
+
 /* parser_init()
  * Initialize to parse one query string
  */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 776b1c0..98b383f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -188,7 +188,7 @@ typedef struct av_relation
 typedef struct autovac_table
 {
 	Oid			at_relid;
-	int			at_vacoptions;	/* bitmask of VacuumOption */
+	VacuumOptions at_vacoptions;	/* contains bitmask of VacuumOption */
 	VacuumParams at_params;
 	int			at_vacuum_cost_delay;
 	int			at_vacuum_cost_limit;
@@ -2463,7 +2463,7 @@ do_autovacuum(void)
 			 * next table in our list.
 			 */
 			HOLD_INTERRUPTS();
-			if (tab->at_vacoptions & VACOPT_VACUUM)
+			if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 				errcontext("automatic vacuum of table \"%s.%s.%s\"",
 						   tab->at_datname, tab->at_nspname, tab->at_relname);
 			else
@@ -2854,10 +2854,11 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab = palloc(sizeof(autovac_table));
 		tab->at_relid = relid;
 		tab->at_sharedrel = classForm->relisshared;
-		tab->at_vacoptions = VACOPT_SKIPTOAST |
+		tab->at_vacoptions.flags = VACOPT_SKIPTOAST |
 			(dovacuum ? VACOPT_VACUUM : 0) |
 			(doanalyze ? VACOPT_ANALYZE : 0) |
 			(!wraparound ? VACOPT_NOWAIT : 0);
+		tab->at_vacoptions.nworkers = 1;
 		tab->at_params.freeze_min_age = freeze_min_age;
 		tab->at_params.freeze_table_age = freeze_table_age;
 		tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
@@ -3104,10 +3105,10 @@ autovac_report_activity(autovac_table *tab)
 	int			len;
 
 	/* Report the command and possible options */
-	if (tab->at_vacoptions & VACOPT_VACUUM)
+	if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: VACUUM%s",
-				 tab->at_vacoptions & VACOPT_ANALYZE ? " ANALYZE" : "");
+				 tab->at_vacoptions.flags & VACOPT_ANALYZE ? " ANALYZE" : "");
 	else
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: ANALYZE");
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index accf302..1e2e2dc 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3606,6 +3606,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
 			event_name = "ParallelBitmapScan";
 			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_PREPARE:
+			event_name = "ParallelVacuumPrepare";
+			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_DONE:
+			event_name = "ParallelVacuumDone";
+			break;
 		case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
 			event_name = "ProcArrayGroupUpdate";
 			break;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 775477c..3669d48 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -669,7 +669,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				VacuumStmt *stmt = (VacuumStmt *) parsetree;
 
 				/* we choose to allow this during "read only" transactions */
-				PreventCommandDuringRecovery((stmt->options & VACOPT_VACUUM) ?
+				PreventCommandDuringRecovery((stmt->options.flags & VACOPT_VACUUM) ?
 											 "VACUUM" : "ANALYZE");
 				/* forbidden in parallel mode due to CommandIsReadOnly */
 				ExecVacuum(stmt, isTopLevel);
@@ -2498,7 +2498,7 @@ CreateCommandTag(Node *parsetree)
 			break;
 
 		case T_VacuumStmt:
-			if (((VacuumStmt *) parsetree)->options & VACOPT_VACUUM)
+			if (((VacuumStmt *) parsetree)->options.flags & VACOPT_VACUUM)
 				tag = "VACUUM";
 			else
 				tag = "ANALYZE";
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 08a08c8..3c2d5df 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2040,7 +2040,6 @@ EstimateSnapshotSpace(Snapshot snap)
 	Size		size;
 
 	Assert(snap != InvalidSnapshot);
-	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
 	size = add_size(sizeof(SerializedSnapshotData),
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e41024..57bea54 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -132,6 +132,7 @@ extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
 							 Relation relation, Snapshot snapshot);
 extern void heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan);
 extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index a903511..1fc10bf 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "access/heapam.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -157,7 +159,7 @@ extern int	vacuum_multixact_freeze_table_age;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
-extern void vacuum(int options, RangeVar *relation, Oid relid,
+extern void vacuum(VacuumOptions options, RangeVar *relation, Oid relid,
 	   VacuumParams *params, List *va_cols,
 	   BufferAccessStrategy bstrategy, bool isTopLevel);
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
@@ -188,8 +190,9 @@ extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
 
 /* in commands/vacuumlazy.c */
-extern void lazy_vacuum_rel(Relation onerel, int options,
+extern void lazy_vacuum_rel(Relation onerel, VacuumOptions options,
 				VacuumParams *params, BufferAccessStrategy bstrategy);
+extern void LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc);
 
 /* in commands/analyze.c */
 extern void analyze_rel(Oid relid, RangeVar *relation, int options,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 3171815..56a6591 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3094,13 +3094,20 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_NOWAIT = 1 << 5,		/* don't wait to get lock (autovacuum only) */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_PARALLEL = 1 << 8	/* do VACUUM parallelly */
 } VacuumOption;
 
+typedef struct VacuumOptions
+{
+	VacuumOption flags; /* OR of VacuumOption flags */
+	int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
+
 typedef struct VacuumStmt
 {
 	NodeTag		type;
-	int			options;		/* OR of VacuumOption flags */
+	VacuumOptions	options;
 	RangeVar   *relation;		/* single table to process, or NULL */
 	List	   *va_cols;		/* list of column names, or NIL for all */
 } VacuumStmt;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 57ac5d4..1000e71 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -811,6 +811,8 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_PARALLEL_BITMAP_SCAN,
+	WAIT_EVENT_PARALLEL_VACUUM_PREPARE,
+	WAIT_EVENT_PARALLEL_VACUUM_DONE,
 	WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
 	WAIT_EVENT_CLOG_GROUP_UPDATE,
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 6f68663..8887f4d 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -80,6 +80,7 @@ CONTEXT:  SQL function "do_analyze" statement 1
 SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 DROP TABLE vaccluster;
 DROP TABLE vactst;
 -- partitioned table
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 7c5fb04..cbd8c44 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -61,6 +61,7 @@ VACUUM FULL vaccluster;
 VACUUM FULL vactst;
 
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
 
 DROP TABLE vaccluster;
 DROP TABLE vactst;

#42

Thomas Munro

thomas.munro@enterprisedb.com

over 8 years ago

In reply to: Masahiko Sawada (#41)

Re: Block level parallel vacuum WIP

On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Since v4 patch conflicts with current HEAD I attached the latest version patch.

Hi Sawada-san,

Here is an interesting failure with this patch:

test rowsecurity ... FAILED
test rules ... FAILED

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#43

Masahiko Sawada

sawada.mshk@gmail.com

over 8 years ago

In reply to: Thomas Munro (#42)

Re: Block level parallel vacuum WIP

On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Since v4 patch conflicts with current HEAD I attached the latest version patch.

Hi Sawada-san,

Here is an interesting failure with this patch:

test rowsecurity ... FAILED
test rules ... FAILED

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#44

Masahiko Sawada

sawada.mshk@gmail.com

over 8 years ago

In reply to: Masahiko Sawada (#43)

1 attachment(s)

Re: Block level parallel vacuum WIP

On Tue, Sep 19, 2017 at 4:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Since v4 patch conflicts with current HEAD I attached the latest version patch.

Hi Sawada-san,

Here is an interesting failure with this patch:

test rowsecurity ... FAILED
test rules ... FAILED

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Since the patch conflicts with current HEAD, I've rebased the patch
and fixed a bug. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

parallel_vacuum_v5.patchapplication/octet-stream; name=parallel_vacuum_v5.patchDownload

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index e712226..59e90cb 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -30,6 +30,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="PARAMET
     FREEZE
     VERBOSE
     ANALYZE
+    PARALLEL
     DISABLE_PAGE_SKIPPING
 
 <phrase>and <replaceable class="PARAMETER">table_and_columns</replaceable> is:</phrase>
@@ -142,6 +143,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="PARAMET
    </varlistentry>
 
    <varlistentry>
+    <term><literal>PARALLEL <replaceable class="PARAMETER">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable class="PARAMETER">N
+      </replaceable> background workers. Collecting garbage on table is processed
+      in block-level parallel. For tables with indexes, parallel vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a index are processed
+      by particular parallel vacuum worker. This option can not use with <literal>FULL</>
+      option.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>DISABLE_PAGE_SKIPPING</literal></term>
     <listitem>
      <para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c435482..8e64650 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -91,7 +91,6 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
 						bool is_samplescan,
 						bool temp_snap);
 static void heap_parallelscan_startblock_init(HeapScanDesc scan);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1715,7 +1714,7 @@ retry:
  *		first backend gets an InvalidBlockNumber return.
  * ----------------
  */
-static BlockNumber
+BlockNumber
 heap_parallelscan_nextpage(HeapScanDesc scan)
 {
 	BlockNumber page;
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index c6f7b7a..dd22bb9 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -20,6 +20,7 @@
 #include "access/xlog.h"
 #include "catalog/namespace.h"
 #include "commands/async.h"
+#include "commands/vacuum.h"
 #include "executor/execParallel.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -122,6 +123,9 @@ static const struct
 {
 	{
 		"ParallelQueryMain", ParallelQueryMain
+	},
+	{
+		"LazyVacuumWorkerMain", LazyVacuumWorkerMain
 	}
 };
 
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index f439b55..4c9ac4f 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -74,7 +74,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
 				  MultiXactId lastSaneMinMulti);
-static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
+static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options,
 		   VacuumParams *params);
 
 /*
@@ -89,15 +89,15 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	VacuumParams params;
 
 	/* sanity checks on options */
-	Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
-	Assert((vacstmt->options & VACOPT_VACUUM) ||
-		   !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
-	Assert(!(vacstmt->options & VACOPT_SKIPTOAST));
+	Assert(vacstmt->options.flags & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert((vacstmt->options.flags & VACOPT_VACUUM) ||
+		   !(vacstmt->options.flags & (VACOPT_FULL | VACOPT_FREEZE)));
+	Assert(!(vacstmt->options.flags & VACOPT_SKIPTOAST));
 
 	/*
 	 * Make sure VACOPT_ANALYZE is specified if any column lists are present.
 	 */
-	if (!(vacstmt->options & VACOPT_ANALYZE))
+	if (!(vacstmt->options.flags & VACOPT_ANALYZE))
 	{
 		ListCell   *lc;
 
@@ -116,7 +116,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	 * All freeze ages are zero if the FREEZE option is given; otherwise pass
 	 * them as -1 which means to use the default values.
 	 */
-	if (vacstmt->options & VACOPT_FREEZE)
+	if (vacstmt->options.flags & VACOPT_FREEZE)
 	{
 		params.freeze_min_age = 0;
 		params.freeze_table_age = 0;
@@ -163,7 +163,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
  * memory context that will not disappear at transaction commit.
  */
 void
-vacuum(int options, List *relations, VacuumParams *params,
+vacuum(VacuumOptions options, List *relations, VacuumParams *params,
 	   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	static bool in_vacuum = false;
@@ -174,7 +174,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 
 	Assert(params != NULL);
 
-	stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
+	stmttype = (options.flags & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
 
 	/*
 	 * We cannot run VACUUM inside a user transaction block; if we were inside
@@ -184,7 +184,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 	 *
 	 * ANALYZE (without VACUUM) can run either way.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 	{
 		PreventTransactionChain(isTopLevel, stmttype);
 		in_outer_xact = false;
@@ -206,17 +206,26 @@ vacuum(int options, List *relations, VacuumParams *params,
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
-	if ((options & VACOPT_FULL) != 0 &&
-		(options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
 
 	/*
+	 * Sanity check PARALLEL option.
+	 */
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_PARALLEL) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("VACUUM option PARALLEL cannnot be used with FULL")));
+
+	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
 	 */
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 		pgstat_vacuum_stat();
 
 	/*
@@ -281,11 +290,11 @@ vacuum(int options, List *relations, VacuumParams *params,
 	 * transaction block, and also in an autovacuum worker, use own
 	 * transactions so we can release locks sooner.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 		use_own_xacts = true;
 	else
 	{
-		Assert(options & VACOPT_ANALYZE);
+		Assert(options.flags & VACOPT_ANALYZE);
 		if (IsAutoVacuumWorkerProcess())
 			use_own_xacts = true;
 		else if (in_outer_xact)
@@ -335,13 +344,13 @@ vacuum(int options, List *relations, VacuumParams *params,
 		{
 			VacuumRelation *vrel = lfirst_node(VacuumRelation, cur);
 
-			if (options & VACOPT_VACUUM)
+			if (options.flags & VACOPT_VACUUM)
 			{
 				if (!vacuum_rel(vrel->oid, vrel->relation, options, params))
 					continue;
 			}
 
-			if (options & VACOPT_ANALYZE)
+			if (options.flags & VACOPT_ANALYZE)
 			{
 				/*
 				 * If using separate xacts, start one for analyze. Otherwise,
@@ -354,7 +363,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 					PushActiveSnapshot(GetTransactionSnapshot());
 				}
 
-				analyze_rel(vrel->oid, vrel->relation, options, params,
+				analyze_rel(vrel->oid, vrel->relation, options.flags, params,
 							vrel->va_cols, in_outer_xact, vac_strategy);
 
 				if (use_own_xacts)
@@ -390,7 +399,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 		StartTransactionCommand();
 	}
 
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 	{
 		/*
 		 * Update pg_database.datfrozenxid, and truncate pg_xact if possible.
@@ -1318,7 +1327,7 @@ vac_truncate_clog(TransactionId frozenXID,
  *		At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options, VacuumParams *params)
 {
 	LOCKMODE	lmode;
 	Relation	onerel;
@@ -1339,7 +1348,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 */
 	PushActiveSnapshot(GetTransactionSnapshot());
 
-	if (!(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_FULL))
 	{
 		/*
 		 * In lazy vacuum, we can set the PROC_IN_VACUUM flag, which lets
@@ -1379,7 +1388,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
 	 * way, we can be sure that no other backend is vacuuming the same table.
 	 */
-	lmode = (options & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+	lmode = (options.flags & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * Open the relation and get the appropriate lock on it.
@@ -1390,7 +1399,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * If we've been asked not to wait for the relation lock, acquire it first
 	 * in non-blocking mode, before calling try_relation_open().
 	 */
-	if (!(options & VACOPT_NOWAIT))
+	if (!(options.flags & VACOPT_NOWAIT))
 		onerel = try_relation_open(relid, lmode);
 	else if (ConditionalLockRelationOid(relid, lmode))
 		onerel = try_relation_open(relid, NoLock);
@@ -1510,7 +1519,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * us to process it.  In VACUUM FULL, though, the toast table is
 	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 	 */
-	if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_SKIPTOAST) && !(options.flags & VACOPT_FULL))
 		toast_relid = onerel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -1529,7 +1538,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (options & VACOPT_FULL)
+	if (options.flags & VACOPT_FULL)
 	{
 		/* close relation before vacuuming, but hold lock until commit */
 		relation_close(onerel, NoLock);
@@ -1537,7 +1546,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, false,
-					(options & VACOPT_VERBOSE) != 0);
+					(options.flags & VACOPT_VERBOSE) != 0);
 	}
 	else
 		lazy_vacuum_rel(onerel, options, params, vac_strategy);
@@ -1591,8 +1600,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
  * hit dangling index pointers.
  */
 void
-vac_open_indexes(Relation relation, LOCKMODE lockmode,
-				 int *nindexes, Relation **Irel)
+vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
 {
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 30b1c08..f9be38e 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -22,6 +22,20 @@
  * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * In PostgreSQL 11, we support a parallel option for lazy vacuum. In parallel
+ * lazy vacuum, multiple vacuum worker processes get blocks in parallel using
+ * parallel heap scan and process them. If a table with indexes the parallel
+ * vacuum workers vacuum the heap and indexes in parallel.  Also, since dead
+ * tuple TIDs is shared with all vacuum processes including the leader process
+ * the parallel vacuum processes have to make two synchronization points in
+ * lazy vacuum processing: when before starting vacuum and when before clearing
+ * dead tuple TIDs. In these two points the leader treats dead tuple TIDs as
+ * an arbiter. The information required by parallel lazy vacuum such as the
+ * statistics of table, parallel heap scan description have to be shared with
+ * all vacuum processes, and table statistics are funneled by the leader
+ * process after finished. Note that dead tuple TIDs need to be shared only
+ * when the table has indexes. For table with no indexes, each parallel worker
+ * processes blocks and vacuum them independently.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -41,8 +55,10 @@
 #include "access/heapam_xlog.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -54,6 +70,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "utils/lsyscache.h"
@@ -62,6 +79,7 @@
 #include "utils/timestamp.h"
 #include "utils/tqual.h"
 
+//#define PLV_TIME
 
 /*
  * Space/time tradeoff parameters: do these need to be user-tunable?
@@ -103,10 +121,81 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+/* DSM key for parallel lazy vacuum */
+#define VACUUM_KEY_PARALLEL_SCAN	UINT64CONST(0xFFFFFFFFFFF00001)
+#define VACUUM_KEY_VACUUM_STATS		UINT64CONST(0xFFFFFFFFFFF00002)
+#define VACUUM_KEY_INDEX_STATS	    UINT64CONST(0xFFFFFFFFFFF00003)
+#define VACUUM_KEY_DEAD_TUPLE_CTL	UINT64CONST(0xFFFFFFFFFFF00004)
+#define VACUUM_KEY_DEAD_TUPLES		UINT64CONST(0xFFFFFFFFFFF00005)
+#define VACUUM_KEY_PARALLEL_STATE	UINT64CONST(0xFFFFFFFFFFF00006)
+
+/*
+ * see note of lazy_scan_heap_get_nextpage about forcing scanning of
+ * last page
+ */
+#define FORCE_CHECK_PAGE(blk) \
+	(blkno == (blk - 1) && should_attempt_truncation(vacrelstats))
+
+/* Check if given index is assigned to this parallel vacuum worker */
+#define IsAssignedIndex(i, pstate) \
+	(pstate == NULL || \
+	 (((i) % ((LVParallelState *) (pstate))->nworkers -1 ) == ParallelWorkerNumber))
+
+#define IsDeadTupleShared(lvstate) \
+	((LVState *)(lvstate))->parallel_mode && \
+	((LVState *)(lvstate))->vacrelstats->nindexes > 0
+
+/* Vacuum worker state for parallel lazy vacuum */
+#define VACSTATE_SCAN			0x1	/* heap scan phase */
+#define VACSTATE_VACUUM			0x2	/* vacuuming on table and index */
+
+/*
+ * Vacuum relevant options and thresholds we need share with parallel
+ * vacuum workers.
+ */
+typedef struct VacuumInfo
+{
+	int				options;	/* VACUUM options */
+	bool			aggressive;	/* does each worker need to aggressive vacuum? */
+	TransactionId	oldestxmin;
+	TransactionId	freezelimit;
+	MultiXactId		multixactcutoff;
+	int				elevel;
+} VacuumInfo;
+
+/* Struct for index statistics that are used for parallel lazy vacuum */
+typedef struct LVIndStats
+{
+	bool		updated;	/* need to be updated? */
+	BlockNumber	num_pages;
+	BlockNumber	num_tuples;
+} LVIndStats;
+
+/* Struct for parallel lazy vacuum state */
+typedef struct LVParallelState
+{
+	int nworkers;			/* # of process doing vacuum */
+	VacuumInfo	info;
+	int	state;				/* current parallel vacuum status */
+	int	finish_count;
+	ConditionVariable cv;
+	slock_t	mutex;
+} LVParallelState;
+
+/* Struct for control dead tuple TIDs array */
+typedef struct LVDeadTupleCtl
+{
+	int			dt_max;	/* # slots allocated in array */
+	int 		dt_count; /* # of dead tuple */
+
+	/* Used only for parallel lazy vacuum */
+	int			dt_index;
+	slock_t 	mutex;
+} LVDeadTupleCtl;
+
 typedef struct LVRelStats
 {
-	/* hasindex = true means two-pass strategy; false means one-pass */
-	bool		hasindex;
+	int			nindexes; /* > 0 means two-pass strategy; = 0 means one-pass */
 	/* Overall statistics about rel */
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
@@ -118,19 +207,46 @@ typedef struct LVRelStats
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
 	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
 	double		tuples_deleted;
-	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
 	int			num_index_scans;
+	BlockNumber pages_removed;
+	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+/* Struct for lazy vacuum execution */
+typedef struct LVState
+{
+	bool		parallel_mode;
+	LVRelStats *vacrelstats;
+	/*
+	 * Used when both parallel and non-parallel lazy vacuum, but in parallel
+	 * lazy vacuum and table with index, dtctl points to a dynamic shared memory
+	 * and controlled by dtctl struct.
+	 */
+	LVDeadTupleCtl	*dtctl;
+	ItemPointer	deadtuples;
+
+	/* Used only for parallel lazy vacuum */
+	ParallelContext *pcxt;
+	LVParallelState *pstate;
+	ParallelHeapScanDesc pscan;
+	LVIndStats *indstats;
+} LVState;
+
+/*
+ * Scan description data for lazy vacuum. In parallel lazy vacuum,
+ * we use only heapscan instead.
+ */
+typedef struct LVScanDescData
+{
+	BlockNumber lv_cblock;					/* current scanning block number */
+	BlockNumber lv_next_unskippable_block;	/* next block number we cannot skip */
+	BlockNumber lv_nblocks;					/* the number blocks of relation */
+	HeapScanDesc heapscan;					/* field for parallel lazy vacuum */
+} LVScanDescData;
+typedef struct LVScanDescData *LVScanDesc;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -141,32 +257,47 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
-
-/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
-			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
-			   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+/* nonf-export function prototypes */
+static void lazy_vacuum_heap(Relation onerel, LVState *lvstate);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats);
-static void lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats);
+							  LVState *lvstate);
+static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
 						 LVRelStats *vacrelstats);
-static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
-static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr);
+static void lazy_space_alloc(LVState *lvstate, BlockNumber relblocks);
+static void lazy_record_dead_tuple(LVState *state, ItemPointer itemptr);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static void do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irels,
+							  int nindexes, int options, bool aggressive);
+static void lazy_scan_heap(Relation rel, LVState *lvstate, VacuumOptions options,
+						   bool aggressive);
+
+/* function prototypes for parallel vacuum */
+static void lazy_gather_vacuum_stats(ParallelContext *pxct,
+									 LVRelStats *valrelstats);
+static void lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats);
+static void lazy_initialize_dsm(ParallelContext *pcxt, Relation onrel,
+								LVState *lvstate, int options, bool aggressive);
+static LVState *lazy_initialize_worker(shm_toc *toc);
+static LVScanDesc lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan);
+static void lv_endscan(LVScanDesc lvscan);
+static BlockNumber lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+											   LVScanDesc lvscan,
+											   bool *all_visible_according_to_vm,
+											   Buffer *vmbuffer, int options, bool aggressive);
+static void lazy_prepare_vacuum(LVState *lvstate);
+static void lazy_end_vacuum(LVState *lvstate);
+static long lazy_get_max_dead_tuples(LVRelStats *vacrelstats);
 
 
 /*
@@ -179,12 +310,11 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf,
  *		and locked the relation.
  */
 void
-lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+lazy_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
 				BufferAccessStrategy bstrategy)
 {
-	LVRelStats *vacrelstats;
-	Relation   *Irel;
-	int			nindexes;
+	LVState		*lvstate;
+	LVRelStats	*vacrelstats;
 	PGRUsage	ru0;
 	TimestampTz starttime = 0;
 	long		secs;
@@ -211,7 +341,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		starttime = GetCurrentTimestamp();
 	}
 
-	if (options & VACOPT_VERBOSE)
+	if (options.flags & VACOPT_VERBOSE)
 		elevel = INFO;
 	else
 		elevel = DEBUG2;
@@ -239,10 +369,12 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 											   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 											  mxactFullScanLimit);
-	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+	if (options.flags & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
+	lvstate = (LVState *) palloc0(sizeof(LVState));
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
+	lvstate->vacrelstats = vacrelstats;
 
 	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
 	vacrelstats->old_rel_tuples = onerel->rd_rel->reltuples;
@@ -250,15 +382,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
-	/* Open all indexes of the relation */
-	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
-	vacrelstats->hasindex = (nindexes > 0);
-
 	/* Do the vacuuming */
-	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
-
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
+	lazy_scan_heap(onerel, lvstate, options, aggressive);
 
 	/*
 	 * Compute whether we actually scanned the all unfrozen pages. If we did,
@@ -267,7 +392,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	 * NB: We need to check this before truncating the relation, because that
 	 * will change ->rel_pages.
 	 */
-	if ((vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
+	if ((lvstate->vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
 		< vacrelstats->rel_pages)
 	{
 		Assert(!aggressive);
@@ -329,7 +454,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						new_rel_pages,
 						new_rel_tuples,
 						new_rel_allvisible,
-						vacrelstats->hasindex,
+						(vacrelstats->nindexes != 0),
 						new_frozen_xid,
 						new_min_multi,
 						false);
@@ -439,28 +564,166 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
- *	lazy_scan_heap() -- scan an open heap relation
+ * If the number of workers is specified more than 0, we enter the parallel lazy
+ * vacuum mode. In parallel lazy vacuum mode, we initialize a dynamic shared memory
+ * and launch parallel vacuum workers. The launcher process also vacuums the table
+ * after launched and then waits for the all vacuum workers to finish. After all vacuum
+ * workers finished we gather the vacuum statistics of table and indexes, and update
+ * them.
+ */
+static void
+lazy_scan_heap(Relation onerel, LVState *lvstate, VacuumOptions options,
+			   bool aggressive)
+{
+	ParallelContext	*pcxt;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+	Relation	*Irel;
+	int			nindexes;
+
+	lvstate->parallel_mode = options.nworkers > 0;
+
+	/* Open indexes */
+	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
+	vacrelstats->nindexes = nindexes;
+
+	if (lvstate->parallel_mode)
+	{
+		EnterParallelMode();
+
+		/* Create parallel context and initialize it */
+		pcxt = CreateParallelContext("postgres", "LazyVacuumWorkerMain",
+									 options.nworkers);
+		lvstate->pcxt = pcxt;
+
+		/* Estimate DSM size for parallel vacuum */
+		lazy_estimate_dsm(pcxt, lvstate->vacrelstats);
+
+		/* Initialize DSM for parallel vacuum */
+		InitializeParallelDSM(pcxt);
+		lazy_initialize_dsm(pcxt, onerel, lvstate, options.flags, aggressive);
+
+		/* Launch workers */
+		LaunchParallelWorkers(pcxt);
+	}
+
+	do_lazy_scan_heap(lvstate, onerel, Irel, nindexes, options.flags, aggressive);
+
+	/*
+	 * We can update relation statistics such as scanned page after gathered
+	 * statistics from all workers. Also, in parallel mode since we cannot update
+	 * index statistics at the same time the leader process have to do it.
+	 *
+	 * XXX : If we allows workers to update statistics tuples at the same time
+	 * the updating index statistics can be done in lazy_cleanup_index().
+	 */
+	if (lvstate->parallel_mode)
+	{
+		int i;
+		LVIndStats *indstats = palloc(sizeof(LVIndStats) * lvstate->vacrelstats->nindexes);
+
+		/* Wait for workers finished vacuum */
+		WaitForParallelWorkersToFinish(pcxt);
+
+		/* Gather the result of vacuum statistics from all workers */
+		lazy_gather_vacuum_stats(pcxt, vacrelstats);
+
+		/* Now we can compute the new value for pg_class.reltuples */
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 vacrelstats->rel_pages,
+															 vacrelstats->scanned_pages,
+															 vacrelstats->scanned_tuples);
+
+		/* Copy new index stats to local memory */
+		memcpy(indstats, lvstate->indstats, sizeof(LVIndStats) * vacrelstats->nindexes);
+
+		DestroyParallelContext(pcxt);
+		ExitParallelMode();
+
+		/* After exit parallel mode, update index statistics */
+		for (i = 0; i < vacrelstats->nindexes; i++)
+		{
+			Relation	ind = Irel[i];
+			LVIndStats *indstat = (LVIndStats *) &(indstats[i]);
+
+			if (indstat->updated)
+			   vac_update_relstats(ind,
+								   indstat->num_pages,
+								   indstat->num_tuples,
+								   0,
+								   false,
+								   InvalidTransactionId,
+								   InvalidMultiXactId,
+								   false);
+		}
+	}
+
+	vac_close_indexes(nindexes, Irel, RowExclusiveLock);
+}
+
+/*
+ * Entry point of parallel vacuum worker.
+ */
+void
+LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc)
+{
+	LVState		*lvstate;
+	Relation rel;
+	Relation *indrel;
+	int nindexes_worker;
+
+	/* Look up dynamic shared memory and initialize */
+	lvstate = lazy_initialize_worker(toc);
+
+	Assert(lvstate != NULL);
+
+	rel = relation_open(lvstate->pscan->phs_relid, ShareUpdateExclusiveLock);
+
+	/* Open all indexes */
+	vac_open_indexes(rel, RowExclusiveLock, &nindexes_worker,
+					 &indrel);
+
+	/* Do lazy vacuum */
+	do_lazy_scan_heap(lvstate, rel, indrel, lvstate->vacrelstats->nindexes,
+					  lvstate->pstate->info.options, lvstate->pstate->info.aggressive);
+
+	vac_close_indexes(lvstate->vacrelstats->nindexes, indrel, RowExclusiveLock);
+	heap_close(rel, ShareUpdateExclusiveLock);
+}
+
+/*
+ *	do_lazy_scan_heap() -- scan an open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
  *		things truncate dead tuples to dead line pointers, defragment the
- *		page, and set commit status bits (see heap_page_prune).  It also builds
+ *		page, and set commit status bits (see heap_page_prune).  It also uses
  *		lists of dead tuples and pages with free space, calculates statistics
  *		on the number of live tuples in the heap, and marks pages as
  *		all-visible if appropriate.  When done, or when we run low on space for
- *		dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
- *		to reclaim dead line pointers.
+ *		dead-tuple TIDs, invoke vacuuming of assigned indexes and call lazy_vacuum_heap
+ *		to reclaim dead line pointers. In parallel vacuum, we need to synchronize
+ *		at where scanning heap finished and vacuuming heap finished. The vacuum
+ *		worker reached to that point first need to wait for other vacuum workers
+ *		reached to the same point.
+ *
+ *		In parallel lazy scan, we get next page number using parallel heap scan.
+ *		Since the dead tuple TIDs are shared with all vacuum workers, we have to
+ *		wait for all other workers to reach to the same points where before starting
+ *		reclaiming dead tuple TIDs and before clearing dead tuple TIDs information
+ *		in dynamic shared memory.
  *
  *		If there are no indexes then we can reclaim line pointers on the fly;
  *		dead line pointers need only be retained until all index pointers that
  *		reference them have been killed.
  */
 static void
-lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive)
+do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irel,
+				  int nindexes, int options, bool aggressive)
 {
-	BlockNumber nblocks,
-				blkno;
+	LVRelStats *vacrelstats = lvstate->vacrelstats;
+	BlockNumber blkno;
+	BlockNumber nblocks;
 	HeapTupleData tuple;
+	LVScanDesc lvscan;
 	char	   *relname;
 	BlockNumber empty_pages,
 				vacuumed_pages;
@@ -471,11 +734,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	IndexBulkDeleteResult **indstats;
 	int			i;
 	PGRUsage	ru0;
+#ifdef PLV_TIME
+	PGRUsage	ru_scan;
+	PGRUsage	ru_vacuum;
+#endif
 	Buffer		vmbuffer = InvalidBuffer;
-	BlockNumber next_unskippable_block;
-	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
+	bool		all_visible_according_to_vm = false;
+
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -504,89 +771,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
-	lazy_space_alloc(vacrelstats, nblocks);
+	lazy_space_alloc(lvstate, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
+	/* Begin heap scan for vacuum */
+	lvscan = lv_beginscan(onerel, lvstate->pscan);
+
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = lvstate->dtctl->dt_max;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/*
-	 * Except when aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-	 * sequentially, the OS should be doing readahead for us, so there's no
-	 * gain in skipping a page now and then; that's likely to disable
-	 * readahead and so be counterproductive. Also, skipping even a single
-	 * page means that we can't update relfrozenxid, so we only want to do it
-	 * if we can skip a goodly number of pages.
-	 *
-	 * When aggressive is set, we can't skip pages just because they are
-	 * all-visible, but we can still skip pages that are all-frozen, since
-	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
-	 *
-	 * Before entering the main loop, establish the invariant that
-	 * next_unskippable_block is the next block number >= blkno that we can't
-	 * skip based on the visibility map, either all-visible for a regular scan
-	 * or all-frozen for an aggressive scan.  We set it to nblocks if there's
-	 * no such block.  We also set up the skipping_blocks flag correctly at
-	 * this stage.
-	 *
-	 * Note: The value returned by visibilitymap_get_status could be slightly
-	 * out-of-date, since we make this test before reading the corresponding
-	 * heap page or locking the buffer.  This is OK.  If we mistakenly think
-	 * that the page is all-visible or all-frozen when in fact the flag's just
-	 * been cleared, we might fail to vacuum the page.  It's easy to see that
-	 * skipping a page when aggressive is not set is not a very big deal; we
-	 * might leave some dead tuples lying around, but the next vacuum will
-	 * find them.  But even when aggressive *is* set, it's still OK if we miss
-	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
-	 * just added to that page are necessarily newer than the GlobalXmin we
-	 * computed, so they'll have no effect on the value to which we can safely
-	 * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
-	 *
-	 * We will scan the table's last page, at least to the extent of
-	 * determining whether it has tuples or not, even if it should be skipped
-	 * according to the above rules; except when we've already determined that
-	 * it's not worth trying to truncate the table.  This avoids having
-	 * lazy_truncate_heap() take access-exclusive lock on the table to attempt
-	 * a truncation that just fails immediately because there are tuples in
-	 * the last page.  This is worth avoiding mainly because such a lock must
-	 * be replayed on any hot standby, where it can be disruptive.
-	 */
-	next_unskippable_block = 0;
-	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-	{
-		while (next_unskippable_block < nblocks)
-		{
-			uint8		vmstatus;
-
-			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-												&vmbuffer);
-			if (aggressive)
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-					break;
-			}
-			else
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-					break;
-			}
-			vacuum_delay_point();
-			next_unskippable_block++;
-		}
-	}
-
-	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
-		skipping_blocks = true;
-	else
-		skipping_blocks = false;
-
-	for (blkno = 0; blkno < nblocks; blkno++)
+#ifdef PLV_TIME
+	pg_rusage_init(&ru_scan);
+#endif
+	while((blkno = lazy_scan_get_nextpage(onerel, lvstate, lvscan,
+										  &all_visible_according_to_vm,
+										  &vmbuffer, options, aggressive)) != InvalidBlockNumber)
 	{
 		Buffer		buf;
 		Page		page;
@@ -597,99 +799,31 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		int			prev_dead_count;
 		int			nfrozen;
 		Size		freespace;
-		bool		all_visible_according_to_vm = false;
 		bool		all_visible;
 		bool		all_frozen = true;	/* provided all_visible is also true */
 		bool		has_dead_tuples;
 		TransactionId visibility_cutoff_xid = InvalidTransactionId;
-
-		/* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
-		(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
+		int			dtmax;
+		int			dtcount;
 
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
-		if (blkno == next_unskippable_block)
-		{
-			/* Time to advance next_unskippable_block */
-			next_unskippable_block++;
-			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-			{
-				while (next_unskippable_block < nblocks)
-				{
-					uint8		vmskipflags;
-
-					vmskipflags = visibilitymap_get_status(onerel,
-														   next_unskippable_block,
-														   &vmbuffer);
-					if (aggressive)
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-							break;
-					}
-					else
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-							break;
-					}
-					vacuum_delay_point();
-					next_unskippable_block++;
-				}
-			}
-
-			/*
-			 * We know we can't skip the current block.  But set up
-			 * skipping_blocks to do the right thing at the following blocks.
-			 */
-			if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
-				skipping_blocks = true;
-			else
-				skipping_blocks = false;
-
-			/*
-			 * Normally, the fact that we can't skip this block must mean that
-			 * it's not all-visible.  But in an aggressive vacuum we know only
-			 * that it's not all-frozen, so it might still be all-visible.
-			 */
-			if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
-				all_visible_according_to_vm = true;
-		}
-		else
-		{
-			/*
-			 * The current block is potentially skippable; if we've seen a
-			 * long enough run of skippable blocks to justify skipping it, and
-			 * we're not forced to check it, then go ahead and skip.
-			 * Otherwise, the page must be at least all-visible if not
-			 * all-frozen, so we can set all_visible_according_to_vm = true.
-			 */
-			if (skipping_blocks && !FORCE_CHECK_PAGE())
-			{
-				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was all-frozen, so we have to recheck; but
-				 * in this case an approximate answer is OK.
-				 */
-				if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
-					vacrelstats->frozenskipped_pages++;
-				continue;
-			}
-			all_visible_according_to_vm = true;
-		}
-
 		vacuum_delay_point();
 
 		/*
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if (IsDeadTupleShared(lvstate))
+			SpinLockAcquire(&lvstate->dtctl->mutex);
+
+		dtmax = lvstate->dtctl->dt_max;
+		dtcount = lvstate->dtctl->dt_count;
+
+		if (IsDeadTupleShared(lvstate))
+			SpinLockRelease(&lvstate->dtctl->mutex);
+
+		if (((dtmax - dtcount) < MaxHeapTuplesPerPage) && dtcount > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -697,6 +831,19 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			};
 			int64		hvp_val[2];
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+			/*
+			 * Here we're about to vacuum the table and indexes actually. Before
+			 * entering vacuum state, we have to wait for other vacuum worker to
+			 * reach here.
+			 */
+			lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_vacuum);
+#endif
+
 			/*
 			 * Before beginning index vacuuming, we release any pin we may
 			 * hold on the visibility map page.  This isn't necessary for
@@ -716,11 +863,12 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
 
-			/* Remove index entries */
+			/* Remove assigned index entries */
 			for (i = 0; i < nindexes; i++)
-				lazy_vacuum_index(Irel[i],
-								  &indstats[i],
-								  vacrelstats);
+			{
+				if (IsAssignedIndex(i, lvstate->pstate))
+					lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+			}
 
 			/*
 			 * Report that we are now vacuuming the heap.  We also increase
@@ -733,19 +881,28 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
 
 			/* Remove tuples from heap */
-			lazy_vacuum_heap(onerel, vacrelstats);
+			lazy_vacuum_heap(onerel, lvstate);
 
+#ifdef PLV_TIME
+			elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 			/*
-			 * Forget the now-vacuumed tuples, and press on, but be careful
-			 * not to reset latestRemovedXid since we want that value to be
-			 * valid.
+			 * Here we've done vacuum on the heap and index and we are going
+			 * to begin the next round scan on heap. Wait until all vacuum worker
+			 * finished vacuum. After all vacuum workers finished, forget the
+			 * now-vacuumed tuples, and press on, but be careful not to reset
+			 * latestRemoveXid since we want that value to be valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
-			vacrelstats->num_index_scans++;
+			lazy_end_vacuum(lvstate);
+#ifdef PLV_TIME
+			pg_rusage_init(&ru_scan);
+#endif
 
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			vacrelstats->num_index_scans++;
 		}
 
 		/*
@@ -771,7 +928,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * it's OK to skip vacuuming pages we get a lock conflict on. They
 			 * will be dealt with in some future vacuum.
 			 */
-			if (!aggressive && !FORCE_CHECK_PAGE())
+			if (!aggressive && !FORCE_CHECK_PAGE(blkno))
 			{
 				ReleaseBuffer(buf);
 				vacrelstats->pinskipped_pages++;
@@ -923,7 +1080,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = lvstate->dtctl->dt_count;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -962,7 +1119,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 */
 			if (ItemIdIsDead(itemid))
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				all_visible = false;
 				continue;
 			}
@@ -1067,7 +1224,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			if (tupgone)
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
 													   &vacrelstats->latestRemovedXid);
 				tups_vacuumed += 1;
@@ -1132,13 +1289,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/*
 		 * If there are no indexes then we can vacuum the page right now
-		 * instead of doing a second scan.
+		 * instead of doing a second scan. Because each parallel worker uses its
+		 * own dead tuple area they can vacuum independently.
 		 */
-		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+		if (Irel == NULL && lvstate->dtctl->dt_count > 0)
 		{
 			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+			lazy_vacuum_page(onerel, blkno, buf, 0, lvstate, &vmbuffer);
 			has_dead_tuples = false;
 
 			/*
@@ -1146,7 +1303,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lvstate->dtctl->dt_count = 0;
+
 			vacuumed_pages++;
 		}
 
@@ -1249,7 +1407,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (lvstate->dtctl->dt_count == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
@@ -1264,10 +1422,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_dead_tuples = nkeep;
 
 	/* now we can compute the new value for pg_class.reltuples */
-	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
-														 nblocks,
-														 vacrelstats->tupcount_pages,
-														 num_tuples);
+	if (!lvstate->parallel_mode)
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 nblocks,
+															 vacrelstats->tupcount_pages,
+															 num_tuples);
 
 	/*
 	 * Release any remaining pin on visibility map page.
@@ -1280,13 +1439,25 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (lvstate->dtctl->dt_count > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
 			PROGRESS_VACUUM_NUM_INDEX_VACUUMS
 		};
 		int64		hvp_val[2];
+#ifdef PLV_TIME
+		elog(WARNING, "%d Scan %s", ParallelWorkerNumber, pg_rusage_show(&ru_scan));
+#endif
+		/*
+		 * Here we're about to vacuum the table and indexes actually. Before
+		 * entering vacuum state, we have to wait for other vacuum worker to
+		 * reach here.
+		 */
+		lazy_prepare_vacuum(lvstate);
+#ifdef PLV_TIME
+		pg_rusage_init(&ru_vacuum);
+#endif
 
 		/* Log cleanup info before we touch indexes */
 		vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -1297,9 +1468,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/* Remove index entries */
 		for (i = 0; i < nindexes; i++)
-			lazy_vacuum_index(Irel[i],
-							  &indstats[i],
-							  vacrelstats);
+		{
+			if (IsAssignedIndex(i, lvstate->pstate))
+				lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+		}
 
 		/* Report that we are now vacuuming the heap */
 		hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
@@ -1309,8 +1481,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		/* Remove tuples from heap */
 		pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 									 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
-		lazy_vacuum_heap(onerel, vacrelstats);
+
+		lazy_vacuum_heap(onerel, lvstate);
+
 		vacrelstats->num_index_scans++;
+#ifdef PLV_TIME
+		elog(WARNING, "%d VACUUM : %s", ParallelWorkerNumber, pg_rusage_show(&ru_vacuum));
+#endif
 	}
 
 	/* report all blocks vacuumed; and that we're cleaning up */
@@ -1320,7 +1497,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
-		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+	{
+		if (IsAssignedIndex(i, lvstate->pstate))
+			lazy_cleanup_index(Irel[i], indstats[i], lvstate->vacrelstats,
+							   lvstate->parallel_mode ? &(lvstate->indstats[i]) : NULL);
+	}
 
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
@@ -1329,12 +1510,16 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	lv_endscan(lvscan);
+
 	/*
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
 	 */
 	initStringInfo(&buf);
 	appendStringInfo(&buf,
+					 "------- worker %d TOTAL stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
 					 _("%.0f dead row versions cannot be removed yet, oldest xmin: %u\n"),
 					 nkeep, OldestXmin);
 	appendStringInfo(&buf, _("There were %.0f unused item pointers.\n"),
@@ -1362,6 +1547,35 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pfree(buf.data);
 }
 
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+lazy_gather_vacuum_stats(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	int	i;
+	LVRelStats *lvstats_list;
+
+	lvstats_list = (LVRelStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_VACUUM_STATS, false);
+
+	/* Gather each worker stats */
+	for (i = 0; i < pcxt->nworkers_launched; i++)
+	{
+		LVRelStats *wstats = (LVRelStats*) ((char *) lvstats_list + sizeof(LVRelStats) * i);
+
+		vacrelstats->scanned_pages += wstats->scanned_pages;
+		vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+		vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+		vacrelstats->scanned_tuples += wstats->scanned_tuples;
+		vacrelstats->new_dead_tuples += wstats->new_dead_tuples;
+		vacrelstats->pages_removed += wstats->pages_removed;
+		vacrelstats->tuples_deleted += wstats->tuples_deleted;
+		vacrelstats->nonempty_pages += wstats->nonempty_pages;
+	}
+
+	/* all vacuum workers have same value of rel_pages */
+	vacrelstats->rel_pages = lvstats_list->rel_pages;
+}
 
 /*
  *	lazy_vacuum_heap() -- second pass over the heap
@@ -1375,18 +1589,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
  * process index entry removal in batches as large as possible.
  */
 static void
-lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
+lazy_vacuum_heap(Relation onerel, LVState *lvstate)
 {
 	int			tupindex;
 	int			npages;
 	PGRUsage	ru0;
+	BlockNumber	prev_tblk;
 	Buffer		vmbuffer = InvalidBuffer;
+	ItemPointer	deadtuples = lvstate->deadtuples;
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+	BlockNumber	ntuples = 0;
+	StringInfoData	buf;
 
 	pg_rusage_init(&ru0);
 	npages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+
+	while (tupindex < dtctl->dt_count)
 	{
 		BlockNumber tblk;
 		Buffer		buf;
@@ -1395,7 +1615,40 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 
 		vacuum_delay_point();
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		/*
+		 * If the dead tuple TIDs are shared with all vacuum workers,
+		 * we acquire the lock and advance tupindex before vacuuming.
+		 *
+		 * NB: The number of maximum tuple can be stored into single
+		 * page is not a large number in most cases. We can use spinlock
+		 * here.
+		 */
+		if (IsDeadTupleShared(lvstate))
+		{
+			SpinLockAcquire(&(dtctl->mutex));
+
+			tupindex = dtctl->dt_index;
+
+			if (tupindex >= dtctl->dt_count)
+			{
+				SpinLockRelease(&(dtctl->mutex));
+				break;
+			}
+
+			/* Advance dtct->dt_index */
+			prev_tblk = tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
+			while(prev_tblk == tblk &&
+				  dtctl->dt_index < dtctl->dt_count)
+			{
+				tblk = ItemPointerGetBlockNumber(&deadtuples[dtctl->dt_index]);
+				dtctl->dt_index++;
+				ntuples++;
+			}
+
+			SpinLockRelease(&(dtctl->mutex));
+		}
+
+		tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
 		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
 								 vac_strategy);
 		if (!ConditionalLockBufferForCleanup(buf))
@@ -1404,7 +1657,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 			++tupindex;
 			continue;
 		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, lvstate,
 									&vmbuffer);
 
 		/* Now that we've compacted the page, record its available space */
@@ -1422,10 +1675,17 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 		vmbuffer = InvalidBuffer;
 	}
 
+#ifdef PLV_TIME
+	elog(WARNING, "%d TABLE %s", ParallelWorkerNumber, pg_rusage_show(&ru0));
+#endif
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "------- worker %d VACUUM HEAP stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
+					 "\"%s\": removed %d row versions in %d pages",
+					 RelationGetRelationName(onerel), ntuples, npages);
 	ereport(elevel,
-			(errmsg("\"%s\": removed %d row versions in %d pages",
-					RelationGetRelationName(onerel),
-					tupindex, npages),
+			(errmsg("%s", buf.data),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1435,34 +1695,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
- * tuple for this page.  We assume the rest follow sequentially.
- * The return value is the first tupindex after the tuples of this page.
  */
 static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < lvstate->dtctl->dt_count; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&lvstate->deadtuples[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&lvstate->deadtuples[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1587,14 +1845,15 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
  *	lazy_vacuum_index() -- vacuum one index relation.
  *
  *		Delete all the index entries pointing to tuples listed in
- *		vacrelstats->dead_tuples, and update running statistics.
+ *		lvstate->deadtuples, and update running statistics.
  */
 static void
 lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats)
+				  LVState	*lvstate)
 {
 	IndexVacuumInfo ivinfo;
+	StringInfoData buf;
 	PGRUsage	ru0;
 
 	pg_rusage_init(&ru0);
@@ -1603,17 +1862,25 @@ lazy_vacuum_index(Relation indrel,
 	ivinfo.analyze_only = false;
 	ivinfo.estimated_count = true;
 	ivinfo.message_level = elevel;
-	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
+	ivinfo.num_heap_tuples = lvstate->vacrelstats->old_rel_tuples;
 	ivinfo.strategy = vac_strategy;
 
 	/* Do bulk deletion */
-	*stats = index_bulk_delete(&ivinfo, *stats,
-							   lazy_tid_reaped, (void *) vacrelstats);
+	*stats = index_bulk_delete(&ivinfo, *stats, lazy_tid_reaped, (void *) lvstate);
+
+#ifdef PLV_TIME
+	elog(WARNING, "%d INDEX(%d) %s", ParallelWorkerNumber, RelationGetRelid(indrel),
+		 pg_rusage_show(&ru0));
+#endif
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "------- worker %d VACUUM INDEX stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
+					 "scanned index \"%s\" to remove %d row versions",
+					 RelationGetRelationName(indrel), lvstate->dtctl->dt_count);
 
 	ereport(elevel,
-			(errmsg("scanned index \"%s\" to remove %d row versions",
-					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+			(errmsg("%s", buf.data),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1621,11 +1888,11 @@ lazy_vacuum_index(Relation indrel,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  */
 static void
-lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats)
+lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat)
 {
 	IndexVacuumInfo ivinfo;
+	StringInfoData	buf;
 	PGRUsage	ru0;
 
 	pg_rusage_init(&ru0);
@@ -1639,34 +1906,54 @@ lazy_cleanup_index(Relation indrel,
 
 	stats = index_vacuum_cleanup(&ivinfo, stats);
 
+	/* Will be updated by leader process after vacuumed */
+	if (indstat)
+		indstat->updated = false;
+
 	if (!stats)
 		return;
 
 	/*
 	 * Now update statistics in pg_class, but only if the index says the count
-	 * is accurate.
+	 * is accurate. In parallel lazy vacuum, the worker can not update these
+	 * information by itself, so save to DSM and then the launcher process
+	 * updates it later.
 	 */
 	if (!stats->estimated_count)
-		vac_update_relstats(indrel,
-							stats->num_pages,
-							stats->num_index_tuples,
-							0,
-							false,
-							InvalidTransactionId,
-							InvalidMultiXactId,
-							false);
+	{
+		if (indstat)
+		{
+			indstat->updated = true;
+			indstat->num_pages = stats->num_pages;
+			indstat->num_tuples = stats->num_index_tuples;
+		}
+		else
+			vac_update_relstats(indrel,
+								stats->num_pages,
+								stats->num_index_tuples,
+								0,
+								false,
+								InvalidTransactionId,
+								InvalidMultiXactId,
+								false);
+	}
 
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "------- worker %d CLEANUP INDEX stats -------\n", ParallelWorkerNumber);
+	appendStringInfo(&buf,
+					 "index \"%s\" now contains %.0f row versions in %u pages",
+					 RelationGetRelationName(indrel),
+					 stats->num_index_tuples,
+					 stats->num_pages);
 	ereport(elevel,
-			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
-					RelationGetRelationName(indrel),
-					stats->num_index_tuples,
-					stats->num_pages),
-			 errdetail("%.0f index row versions were removed.\n"
-					   "%u index pages have been deleted, %u are currently reusable.\n"
-					   "%s.",
-					   stats->tuples_removed,
-					   stats->pages_deleted, stats->pages_free,
-					   pg_rusage_show(&ru0))));
+			(errmsg("%s", buf.data),
+					errdetail("%.0f index row versions were removed.\n"
+							  "%u index pages have been deleted, %u are currently reusable.\n"
+							  "%s.",
+							  stats->tuples_removed,
+							  stats->pages_deleted, stats->pages_free,
+							  pg_rusage_show(&ru0))));
 
 	pfree(stats);
 }
@@ -1976,59 +2263,69 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 /*
  * lazy_space_alloc - space allocation decisions for lazy vacuum
  *
+ * In parallel lazy vacuum the space for dead tuple locations are already
+ * allocated in dynamic shared memory, so we allocate space for dead tuple
+ * locations in local memory only when in not parallel lazy vacuum and set
+ * MyDeadTuple.
+ *
  * See the comments at the head of this file for rationale.
  */
 static void
-lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
+lazy_space_alloc(LVState *lvstate, BlockNumber relblocks)
 {
-	long		maxtuples;
-	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
-	autovacuum_work_mem != -1 ?
-	autovacuum_work_mem : maintenance_work_mem;
+	long maxtuples;
 
-	if (vacrelstats->hasindex)
-	{
-		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
-		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+	/*
+	 * In parallel mode, we already set the pointer to dead tuple
+	 * array when initialize.
+	 */
+	if (lvstate->parallel_mode && lvstate->vacrelstats->nindexes > 0)
+		return;
 
-		/* curious coding here to ensure the multiplication can't overflow */
-		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
-			maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+	maxtuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
 
-		/* stay sane if small maintenance_work_mem */
-		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
-	}
-	else
-	{
-		maxtuples = MaxHeapTuplesPerPage;
-	}
-
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	/*
+	 * If in not parallel lazy vacuum, we need to allocate dead
+	 * tuple array in local memory.
+	 */
+	lvstate->deadtuples = palloc0(sizeof(ItemPointerData) * (int)maxtuples);
+	lvstate->dtctl = (LVDeadTupleCtl *) palloc(sizeof(LVDeadTupleCtl));
+	lvstate->dtctl->dt_max = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+	lvstate->dtctl->dt_count = 0;
 }
 
 /*
  * lazy_record_dead_tuple - remember one deletable tuple
+ *
+ * Acquiring the spinlock before remember is required if the dead tuple
+ * TIDs are shared with other vacuum workers.
  */
 static void
-lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr)
+lazy_record_dead_tuple(LVState *lvstate, ItemPointer itemptr)
 {
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockAcquire(&(dtctl->mutex));
+
+	if (dtctl->dt_count >= dtctl->dt_max)
+		elog(ERROR, "dead tuple array overflow");
+
 	/*
 	 * The array must never overflow, since we rely on all deletable tuples
 	 * being removed; inability to remove a tuple might cause an old XID to
 	 * persist beyond the freeze limit, which could be disastrous later on.
 	 */
-	if (vacrelstats->num_dead_tuples >= vacrelstats->max_dead_tuples)
-		elog(ERROR, "dead tuple array overflow");
+	if (dtctl->dt_count < dtctl->dt_max)
+	{
 
-	vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-	vacrelstats->num_dead_tuples++;
-	pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-								 vacrelstats->num_dead_tuples);
+		lvstate->deadtuples[dtctl->dt_count] = *itemptr;
+		(dtctl->dt_count)++;
+		/* XXX : Update progress information here */
+	}
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockRelease(&(dtctl->mutex));
 }
 
 /*
@@ -2041,16 +2338,23 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
 static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
-	LVRelStats *vacrelstats = (LVRelStats *) state;
+	LVState *lvstate = (LVState *) state;
 	ItemPointer res;
 
+	/*
+	 * We can assume that the dead tuple TIDs are sorted by TID location
+	 * even when we shared the dead tuple TIDs with other vacuum workers.
+	 */
 	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
+								(void *) lvstate->deadtuples,
+								lvstate->dtctl->dt_count,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
 
-	return (res != NULL);
+	if (res != NULL)
+		return true;
+
+	return false;
 }
 
 /*
@@ -2194,3 +2498,622 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number we need to scan next, or InvalidBlockNumber if scan
+ * is done.
+ *
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
+ * sequentially, the OS should be doing readahead for us, so there's no
+ * gain in skipping a page now and then; that's likely to disable
+ * readahead and so be counterproductive. Also, skipping even a single
+ * page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * Before entering the main loop, establish the invariant that
+ * next_unskippable_block is the next block number >= blkno that we can't
+ * skip based on the visibility map, either all-visible for a regular scan
+ * or all-frozen for an aggressive scan.  We set it to nblocks if there's
+ * no such block.  We also set up the skipping_blocks flag correctly at
+ * this stage.
+ *
+ * In not parallel mode, before entering the main loop, establish the
+ * invariant that next_unskippable_block is the next block number >= blkno
+ * that's not we can't skip based on the visibility map, either all-visible
+ * for a regular scan or all-frozen for an aggressive scan.  We set it to
+ * nblocks if there's no such block.  We also set up the skipping_blocks
+ * flag correctly at this stage.
+ *
+ * In parallel mode, pstate is not NULL. We scan heap pages
+ * using parallel heap scan description. Each worker calls heap_parallelscan_nextpage()
+ * in order to exclusively get  block number we need to scan at next.
+ * If given block is all-visible according to visibility map, we skip to
+ * scan this block immediately unlike not parallel lazy scan.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer.  This is OK.  If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page.  It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them.  But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared.  Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * Computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table.  This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page.  This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static BlockNumber
+lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+					   LVScanDesc lvscan, bool *all_visible_according_to_vm,
+					   Buffer *vmbuffer, int options, bool aggressive)
+{
+	BlockNumber blkno;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+
+	if (lvstate->parallel_mode)
+	{
+		/*
+		 * In parallel lazy vacuum since it's hard to know how many consecutive
+		 * all-visible pages exits on table we skip to scan the heap page immediately.
+		 * if it is all-visible page.
+		 */
+		while ((blkno = heap_parallelscan_nextpage(lvscan->heapscan)) != InvalidBlockNumber)
+		{
+			*all_visible_according_to_vm = false;
+			vacuum_delay_point();
+
+			/* Consider to skip scan page according visibility map */
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0 &&
+				!FORCE_CHECK_PAGE(blkno))
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, blkno, vmbuffer);
+
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+					{
+						vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+					else if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+						*all_visible_according_to_vm = true;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+					{
+						if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+							vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+				}
+			}
+
+			/* We need to scan current blkno, break */
+			break;
+		}
+	}
+	else
+	{
+		bool skipping_blocks = false;
+
+		/* Initialize lv_nextunskippable_page if needed */
+		if (lvscan->lv_cblock == 0 && (options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+		{
+			while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, lvscan->lv_next_unskippable_block,
+													vmbuffer);
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+						break;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+						break;
+				}
+				vacuum_delay_point();
+				lvscan->lv_next_unskippable_block++;
+			}
+
+			if (lvscan->lv_next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+				skipping_blocks = true;
+			else
+				skipping_blocks = false;
+		}
+
+		/* Decide the block number we need to scan */
+		for (blkno = lvscan->lv_cblock; blkno < lvscan->lv_nblocks; blkno++)
+		{
+			if (blkno == lvscan->lv_next_unskippable_block)
+			{
+				/* Time to advance next_unskippable_block */
+				lvscan->lv_next_unskippable_block++;
+				if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+				{
+					while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+					{
+						uint8		vmstatus;
+
+						vmstatus = visibilitymap_get_status(onerel,
+															lvscan->lv_next_unskippable_block,
+															vmbuffer);
+						if (aggressive)
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+								break;
+						}
+						else
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+								break;
+						}
+						vacuum_delay_point();
+						lvscan->lv_next_unskippable_block++;
+					}
+				}
+
+				/*
+				 * We know we can't skip the current block.  But set up
+				 * skipping_all_visible_blocks to do the right thing at the
+				 * following blocks.
+				 */
+				if (lvscan->lv_next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
+					skipping_blocks = true;
+				else
+					skipping_blocks = false;
+
+				/*
+				 * Normally, the fact that we can't skip this block must mean that
+				 * it's not all-visible.  But in an aggressive vacuum we know only
+				 * that it's not all-frozen, so it might still be all-visible.
+				 */
+				if (aggressive && VM_ALL_VISIBLE(onerel, blkno, vmbuffer))
+					*all_visible_according_to_vm = true;
+
+				/* Found out that next unskippable block number */
+				break;
+			}
+			else
+			{
+				/*
+				 * The current block is potentially skippable; if we've seen a
+				 * long enough run of skippable blocks to justify skipping it, and
+				 * we're not forced to check it, then go ahead and skip.
+				 * Otherwise, the page must be at least all-visible if not
+				 * all-frozen, so we can set all_visible_according_to_vm = true.
+				 */
+				if (skipping_blocks && !FORCE_CHECK_PAGE(blkno))
+				{
+					/*
+					 * Tricky, tricky.  If this is in aggressive vacuum, the page
+					 * must have been all-frozen at the time we checked whether it
+					 * was skippable, but it might not be any more.  We must be
+					 * careful to count it as a skipped all-frozen page in that
+					 * case, or else we'll think we can't update relfrozenxid and
+					 * relminmxid.  If it's not an aggressive vacuum, we don't
+					 * know whether it was all-frozen, so we have to recheck; but
+					 * in this case an approximate answer is OK.
+					 */
+					if (aggressive || VM_ALL_FROZEN(onerel, blkno, vmbuffer))
+						vacrelstats->frozenskipped_pages++;
+					continue;
+				}
+
+				*all_visible_according_to_vm = true;
+
+				/* We need to scan current blkno, break */
+				break;
+			}
+		} /* for */
+
+		/* Advance the current block number for the next scan */
+		lvscan->lv_cblock = blkno + 1;
+	}
+
+	return (blkno == lvscan->lv_nblocks) ? InvalidBlockNumber : blkno;
+}
+
+/*
+ * Begin lazy vacuum scan. lvscan->heapscan is NULL if
+ * we're not in parallel lazy vacuum.
+ */
+static LVScanDesc
+lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan)
+{
+	LVScanDesc lvscan;
+
+	lvscan = (LVScanDesc) palloc(sizeof(LVScanDescData));
+
+	lvscan->lv_cblock = 0;
+	lvscan->lv_next_unskippable_block = 0;
+	lvscan->lv_nblocks = RelationGetNumberOfBlocks(onerel);
+
+	if (pscan != NULL)
+		lvscan->heapscan = heap_beginscan_parallel(onerel, pscan);
+	else
+		lvscan->heapscan = NULL;
+
+	return lvscan;
+}
+
+/*
+ * End lazy vacuum scan.
+ */
+static void
+lv_endscan(LVScanDesc lvscan)
+{
+	if (lvscan->heapscan != NULL)
+		heap_endscan(lvscan->heapscan);
+	pfree(lvscan);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Lazy Vacuum Support
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * Estimate storage for parallel lazy vacuum.
+ */
+static void
+lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	Size size = 0;
+	int keys = 0;
+	int vacuum_workers = pcxt->nworkers + 1;
+	long maxtuples = lazy_get_max_dead_tuples(vacrelstats);
+
+	/* Estimate size for parallel heap scan */
+	size += heap_parallelscan_estimate(SnapshotAny);
+	keys++;
+
+	/* Estimate size for vacuum statistics for only workers*/
+	size += BUFFERALIGN(mul_size(sizeof(LVRelStats), pcxt->nworkers));
+	keys++;
+
+	/* We have to share dead tuple information only when the table has indexes */
+	if (vacrelstats->nindexes > 0)
+	{
+		/* Estimate size for index statistics */
+		size += BUFFERALIGN(mul_size(sizeof(LVIndStats), vacrelstats->nindexes));
+		keys++;
+
+		/* Estimate size for dead tuple control */
+		size += BUFFERALIGN(sizeof(LVDeadTupleCtl));
+		keys++;
+
+		/* Estimate size for dead tuple array */
+		size += BUFFERALIGN(mul_size(
+							 mul_size(sizeof(ItemPointerData), maxtuples),
+							 vacuum_workers));
+		keys++;
+	}
+
+	/* Estimate size for parallel lazy vacuum state */
+	size += BUFFERALIGN(sizeof(LVParallelState));
+	keys++;
+
+	/* Estimate size for vacuum task */
+	size += BUFFERALIGN(sizeof(VacuumInfo));
+	keys++;
+
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, keys);
+}
+
+/*
+ * Initialize dynamic shared memory for parallel lazy vacuum. We store
+ * relevant informations of parallel heap scanning, dead tuple array
+ * and vacuum statistics for each worker and some parameters for lazy vacuum.
+ */
+static void
+lazy_initialize_dsm(ParallelContext *pcxt, Relation onerel, LVState *lvstate,
+					int options, bool aggressive)
+{
+	ParallelHeapScanDesc pscan_ptr;
+	ItemPointer	deadtuples_ptr;
+	char 		*lvrelstats_ptr;
+	LVParallelState *pstate_ptr;
+	LVIndStats	*indstats_ptr;
+	LVDeadTupleCtl	*dtctl_ptr;
+	int i;
+	int deadtuples_size;
+	int lvrelstats_size;
+	int	vacuum_workers = pcxt->nworkers + 1;
+	long max_tuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+
+	/* Allocate and initialize DSM for vacuum stats for each worker */
+	lvrelstats_size = mul_size(sizeof(LVRelStats), pcxt->nworkers);
+	lvrelstats_ptr = shm_toc_allocate(pcxt->toc, lvrelstats_size);
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		char *start;
+
+		start = lvrelstats_ptr + i * sizeof(LVRelStats);
+		memcpy(start, lvstate->vacrelstats, sizeof(LVRelStats));
+	}
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_STATS, lvrelstats_ptr);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Allocate and initialize DSM for dead tuple control */
+		dtctl_ptr = (LVDeadTupleCtl *) shm_toc_allocate(pcxt->toc, sizeof(LVDeadTupleCtl));
+		SpinLockInit(&(dtctl_ptr->mutex));
+		dtctl_ptr->dt_max = max_tuples * vacuum_workers;
+		dtctl_ptr->dt_count = 0;
+		dtctl_ptr->dt_index = 0;
+		lvstate->dtctl = dtctl_ptr;
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLE_CTL, dtctl_ptr);
+
+		/* Allocate and initialize DSM for dead tuple array */
+		deadtuples_size = mul_size(mul_size(sizeof(ItemPointerData), max_tuples),
+								   vacuum_workers);
+		deadtuples_ptr = (ItemPointer) shm_toc_allocate(pcxt->toc,
+														deadtuples_size);
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLES, deadtuples_ptr);
+		lvstate->deadtuples = deadtuples_ptr;
+
+		/* Allocate DSM for index statistics */
+		indstats_ptr = (LVIndStats *) shm_toc_allocate(pcxt->toc,
+													   mul_size(sizeof(LVIndStats),
+																lvstate->vacrelstats->nindexes));
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_INDEX_STATS, indstats_ptr);
+		lvstate->indstats = indstats_ptr;
+	}
+
+	/* Allocate and initialize DSM for parallel scan description */
+	pscan_ptr = (ParallelHeapScanDesc) shm_toc_allocate(pcxt->toc,
+														heap_parallelscan_estimate(SnapshotAny));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_SCAN, pscan_ptr);
+	heap_parallelscan_initialize(pscan_ptr, onerel, SnapshotAny);
+	lvstate->pscan = pscan_ptr;
+
+	/* Allocate and initialize DSM for parallel vacuum state */
+	pstate_ptr = (LVParallelState *) shm_toc_allocate(pcxt->toc, sizeof(LVParallelState));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_STATE, pstate_ptr);
+
+	ConditionVariableInit(&(pstate_ptr->cv));
+	SpinLockInit(&(pstate_ptr->mutex));
+	pstate_ptr->nworkers = vacuum_workers;
+	pstate_ptr->state = VACSTATE_SCAN;
+	pstate_ptr->info.aggressive = aggressive;
+	pstate_ptr->info.options = options;
+	pstate_ptr->info.oldestxmin = OldestXmin;
+	pstate_ptr->info.freezelimit = FreezeLimit;
+	pstate_ptr->info.multixactcutoff = MultiXactCutoff;
+	pstate_ptr->info.elevel = elevel;
+	lvstate->pstate = pstate_ptr;
+}
+
+/*
+ * Initialize parallel lazy vacuum for worker.
+ */
+static LVState *
+lazy_initialize_worker(shm_toc *toc)
+{
+	LVState	*lvstate;
+	char *lvstats;
+
+	lvstate = (LVState *) palloc(sizeof(LVState));
+	lvstate->parallel_mode = true;
+
+	/* Set up vacuum stats */
+	lvstats = shm_toc_lookup(toc, VACUUM_KEY_VACUUM_STATS, false);
+	lvstate->vacrelstats = (LVRelStats *) (lvstats +
+										   sizeof(LVRelStats) * ParallelWorkerNumber);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Set up dead tuple control */
+		lvstate->dtctl = (LVDeadTupleCtl *) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLE_CTL, false);
+
+		/* Set up dead tuple array */
+		lvstate->deadtuples = (ItemPointer) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLES, false);
+
+		/* Set up index statistics */
+		lvstate->indstats = (LVIndStats *) shm_toc_lookup(toc, VACUUM_KEY_INDEX_STATS, false);
+	}
+
+	/* Set up parallel vacuum state */
+	lvstate->pstate = (LVParallelState *) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_STATE, false);
+
+	/* Set up parallel heap scan description */
+	lvstate->pscan = (ParallelHeapScanDesc) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_SCAN, false);
+
+	/* Set up parameters for lazy vacuum */
+	OldestXmin = lvstate->pstate->info.oldestxmin;
+	FreezeLimit = lvstate->pstate->info.freezelimit;
+	MultiXactCutoff = lvstate->pstate->info.multixactcutoff;
+	elevel = lvstate->pstate->info.elevel;
+
+	return lvstate;
+}
+
+/*
+ * In the end of actual vacuumming on table and indexes actually, we have
+ * to wait for other all vacuum workers to reach here before clearing dead
+ * tuple TIDs information.
+ */
+static void
+lazy_end_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+	{
+		lvstate->dtctl->dt_count = 0;
+		return;
+	}
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		/* Fetch shared information */
+		if (!counted)
+			pstate->finish_count++;
+		finish_count = pstate->finish_count;
+		state = pstate->state;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_SCAN)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * If all launched parallel vacuum workers reached here, we can clear the
+		 * dead tuple TIDs information.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			/* Clear dead tuples */
+			lvstate->dtctl->dt_count = 0;
+
+			/* need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_SCAN;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_DONE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Before starting actual vacuuming on table and indexes, we have to wait for
+ * other all vacuum workers so that all worker can see the same dead tuple TIDs
+ * information when vacuuming.
+ */
+static void
+lazy_prepare_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+		return;
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		if (!counted)
+			pstate->finish_count++;
+		state = pstate->state;
+		finish_count = pstate->finish_count;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_VACUUM)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * The leader process can change parallel vacuum state if all workers
+		 * have reached here.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			qsort((void *) lvstate->deadtuples, lvstate->dtctl->dt_count,
+				  sizeof(ItemPointerData), vac_cmp_itemptr);
+
+			/* XXX: need spinlock ? */
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_VACUUM;
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_PREPARE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Return the number of maximum dead tuples can be stored according
+ * to vac_work_mem.
+ */
+static long
+lazy_get_max_dead_tuples(LVRelStats *vacrelstats)
+{
+	long maxtuples;
+	int	vac_work_mem = IsAutoVacuumWorkerProcess() &&
+		autovacuum_work_mem != -1 ?
+		autovacuum_work_mem : maintenance_work_mem;
+
+	if (vacrelstats->nindexes != 0)
+	{
+		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+		maxtuples = Min(maxtuples, INT_MAX);
+		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+
+		/* curious coding here to ensure the multiplication can't overflow */
+		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > vacrelstats->old_rel_pages)
+			maxtuples = vacrelstats->old_rel_pages * LAZY_ALLOC_TUPLES;
+
+		/* stay sane if small maintenance_work_mem */
+		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+	}
+	else
+	{
+		maxtuples = MaxHeapTuplesPerPage;
+	}
+
+	return maxtuples;
+}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 7a70001..8cba9c8 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1662,7 +1662,12 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
 static bool
 _equalVacuumStmt(const VacuumStmt *a, const VacuumStmt *b)
 {
-	COMPARE_SCALAR_FIELD(options);
+	if (a->options.flags != b->options.flags)
+		return false;
+
+	if (a->options.nworkers != b->options.nworkers)
+		return false;
+
 	COMPARE_NODE_FIELD(rels);
 
 	return true;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4c83a63..85f5d07 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -187,6 +187,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 			   bool *deferrable, bool *initdeferred, bool *not_valid,
 			   bool *no_inherit, core_yyscan_t yyscanner);
 static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+static VacuumOptions *makeVacOpt(VacuumOption flag, int nworkers);
 
 %}
 
@@ -237,6 +238,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	struct ImportQual	*importqual;
 	InsertStmt			*istmt;
 	VariableSetStmt		*vsetstmt;
+	VacuumOptions		*vacopts;
 	PartitionElem		*partelem;
 	PartitionSpec		*partspec;
 	PartitionBoundSpec	*partboundspec;
@@ -305,7 +307,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_extension_opt_item alter_extension_opt_item
 
 %type <ival>	opt_lock lock_type cast_context
-%type <ival>	vacuum_option_list vacuum_option_elem
+%type <vacopts>	vacuum_option_list vacuum_option_elem
 %type <boolean>	opt_or_replace
 				opt_grant_grant_option opt_grant_admin_option
 				opt_nowait opt_if_exists opt_with_data
@@ -10152,32 +10154,40 @@ cluster_index_specification:
 VacuumStmt: VACUUM opt_full opt_freeze opt_verbose opt_vacuum_relation_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
 					n->rels = $5;
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose AnalyzeStmt
 				{
 					VacuumStmt *n = (VacuumStmt *) $5;
-					n->options |= VACOPT_VACUUM;
+					n->options.flags |= VACOPT_VACUUM;
 					if ($2)
-						n->options |= VACOPT_FULL;
+						n->options.flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						n->options.flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						n->options.flags |= VACOPT_VERBOSE;
+					n->options.nworkers = 0;
 					$$ = (Node *)n;
 				}
 			| VACUUM '(' vacuum_option_list ')' opt_vacuum_relation_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions	*vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->rels = $5;
 					$$ = (Node *) n;
 				}
@@ -10185,18 +10195,38 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose opt_vacuum_relation_list
 
 vacuum_option_list:
 			vacuum_option_elem								{ $$ = $1; }
-			| vacuum_option_list ',' vacuum_option_elem		{ $$ = $1 | $3; }
+			| vacuum_option_list ',' vacuum_option_elem
+			{
+				VacuumOptions *vacopts1 = (VacuumOptions *)$1;
+				VacuumOptions *vacopts2 = (VacuumOptions *)$3;
+
+				vacopts1->flags |= vacopts2->flags;
+				if (vacopts2->flags == VACOPT_PARALLEL)
+					vacopts1->nworkers = vacopts2->nworkers;
+
+				$$ = vacopts1;
+				pfree(vacopts2);
+			}
 		;
 
 vacuum_option_elem:
-			analyze_keyword		{ $$ = VACOPT_ANALYZE; }
-			| VERBOSE			{ $$ = VACOPT_VERBOSE; }
-			| FREEZE			{ $$ = VACOPT_FREEZE; }
-			| FULL				{ $$ = VACOPT_FULL; }
+			analyze_keyword		{ $$ = makeVacOpt(VACOPT_ANALYZE, 0); }
+			| VERBOSE			{ $$ = makeVacOpt(VACOPT_VERBOSE, 0); }
+			| FREEZE			{ $$ = makeVacOpt(VACOPT_FREEZE, 0); }
+			| FULL				{ $$ = makeVacOpt(VACOPT_FULL, 0); }
+			| PARALLEL ICONST
+				{
+					if ($2 < 1)
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+								 errmsg("parallel vacuum degree must be more than 1"),
+								 parser_errposition(@1)));
+					$$ = makeVacOpt(VACOPT_PARALLEL, $2);
+				}
 			| IDENT
 				{
 					if (strcmp($1, "disable_page_skipping") == 0)
-						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+						$$ = makeVacOpt(VACOPT_DISABLE_PAGE_SKIPPING, 1);
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
@@ -10208,11 +10238,16 @@ vacuum_option_elem:
 AnalyzeStmt: analyze_keyword opt_verbose opt_vacuum_relation_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->rels = $3;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 		;
 
@@ -15915,6 +15950,16 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
 	return (Node *) s;
 }
 
+static VacuumOptions *
+makeVacOpt(VacuumOption flag, int nworkers)
+{
+	VacuumOptions *vacopt = palloc(sizeof(VacuumOptions));
+
+	vacopt->flags = flag;
+	vacopt->nworkers = nworkers;
+	return vacopt;
+}
+
 /* parser_init()
  * Initialize to parse one query string
  */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c04c0b5..ee7e6be 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -189,7 +189,7 @@ typedef struct av_relation
 typedef struct autovac_table
 {
 	Oid			at_relid;
-	int			at_vacoptions;	/* bitmask of VacuumOption */
+	VacuumOptions at_vacoptions;	/* contains bitmask of VacuumOption */
 	VacuumParams at_params;
 	int			at_vacuum_cost_delay;
 	int			at_vacuum_cost_limit;
@@ -2466,7 +2466,7 @@ do_autovacuum(void)
 			 * next table in our list.
 			 */
 			HOLD_INTERRUPTS();
-			if (tab->at_vacoptions & VACOPT_VACUUM)
+			if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 				errcontext("automatic vacuum of table \"%s.%s.%s\"",
 						   tab->at_datname, tab->at_nspname, tab->at_relname);
 			else
@@ -2867,10 +2867,11 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		tab = palloc(sizeof(autovac_table));
 		tab->at_relid = relid;
 		tab->at_sharedrel = classForm->relisshared;
-		tab->at_vacoptions = VACOPT_SKIPTOAST |
+		tab->at_vacoptions.flags = VACOPT_SKIPTOAST |
 			(dovacuum ? VACOPT_VACUUM : 0) |
 			(doanalyze ? VACOPT_ANALYZE : 0) |
 			(!wraparound ? VACOPT_NOWAIT : 0);
+		tab->at_vacoptions.nworkers = 1;
 		tab->at_params.freeze_min_age = freeze_min_age;
 		tab->at_params.freeze_table_age = freeze_table_age;
 		tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
@@ -3116,10 +3117,10 @@ autovac_report_activity(autovac_table *tab)
 	int			len;
 
 	/* Report the command and possible options */
-	if (tab->at_vacoptions & VACOPT_VACUUM)
+	if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: VACUUM%s",
-				 tab->at_vacoptions & VACOPT_ANALYZE ? " ANALYZE" : "");
+				 tab->at_vacoptions.flags & VACOPT_ANALYZE ? " ANALYZE" : "");
 	else
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: ANALYZE");
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3a0b49c..93e1138 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3610,6 +3610,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
 			event_name = "ParallelBitmapScan";
 			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_PREPARE:
+			event_name = "ParallelVacuumPrepare";
+			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_DONE:
+			event_name = "ParallelVacuumDone";
+			break;
 		case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
 			event_name = "ProcArrayGroupUpdate";
 			break;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 82a707a..ead81ff 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -669,7 +669,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				VacuumStmt *stmt = (VacuumStmt *) parsetree;
 
 				/* we choose to allow this during "read only" transactions */
-				PreventCommandDuringRecovery((stmt->options & VACOPT_VACUUM) ?
+				PreventCommandDuringRecovery((stmt->options.flags & VACOPT_VACUUM) ?
 											 "VACUUM" : "ANALYZE");
 				/* forbidden in parallel mode due to CommandIsReadOnly */
 				ExecVacuum(stmt, isTopLevel);
@@ -2498,7 +2498,7 @@ CreateCommandTag(Node *parsetree)
 			break;
 
 		case T_VacuumStmt:
-			if (((VacuumStmt *) parsetree)->options & VACOPT_VACUUM)
+			if (((VacuumStmt *) parsetree)->options.flags & VACOPT_VACUUM)
 				tag = "VACUUM";
 			else
 				tag = "ANALYZE";
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 294ab70..159f2e4 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2040,7 +2040,6 @@ EstimateSnapshotSpace(Snapshot snap)
 	Size		size;
 
 	Assert(snap != InvalidSnapshot);
-	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
 	size = add_size(sizeof(SerializedSnapshotData),
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 4e41024..57bea54 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -132,6 +132,7 @@ extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
 							 Relation relation, Snapshot snapshot);
 extern void heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan);
 extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 7a7b793..e7950d5 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "access/heapam.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -157,7 +159,7 @@ extern int	vacuum_multixact_freeze_table_age;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
-extern void vacuum(int options, List *relations, VacuumParams *params,
+extern void vacuum(VacuumOptions options, List *relations, VacuumParams *params,
 	   BufferAccessStrategy bstrategy, bool isTopLevel);
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
 				 int *nindexes, Relation **Irel);
@@ -187,8 +189,9 @@ extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
 
 /* in commands/vacuumlazy.c */
-extern void lazy_vacuum_rel(Relation onerel, int options,
+extern void lazy_vacuum_rel(Relation onerel, VacuumOptions options,
 				VacuumParams *params, BufferAccessStrategy bstrategy);
+extern void LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc);
 
 /* in commands/analyze.c */
 extern void analyze_rel(Oid relid, RangeVar *relation, int options,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 50eec73..d9ef28c 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3095,9 +3095,16 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_NOWAIT = 1 << 5,		/* don't wait to get lock (autovacuum only) */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_PARALLEL = 1 << 8	/* do VACUUM parallelly */
 } VacuumOption;
 
+typedef struct VacuumOptions
+{
+	VacuumOption flags; /* OR of VacuumOption flags */
+	int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
+
 /*
  * Info about a single target table of VACUUM/ANALYZE.
  *
@@ -3116,7 +3123,7 @@ typedef struct VacuumRelation
 typedef struct VacuumStmt
 {
 	NodeTag		type;
-	int			options;		/* OR of VacuumOption flags */
+	VacuumOptions	options;
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
 } VacuumStmt;
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 089b7c3..e87a7bc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -811,6 +811,8 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_PARALLEL_BITMAP_SCAN,
+	WAIT_EVENT_PARALLEL_VACUUM_PREPARE,
+	WAIT_EVENT_PARALLEL_VACUUM_DONE,
 	WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
 	WAIT_EVENT_CLOG_GROUP_UPDATE,
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index c440c7e..b43ef0a 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -80,6 +80,9 @@ CONTEXT:  SQL function "do_analyze" statement 1
 SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
+VACUUM (PARALLEL 2, DISABLE_PAGE_SKIPPING) vactst;
+VACUUM (PARALLEL 2, FREEZE) vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 92eaca2..ab9bc4c 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -61,6 +61,9 @@ VACUUM FULL vaccluster;
 VACUUM FULL vactst;
 
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
+VACUUM (PARALLEL 2, DISABLE_PAGE_SKIPPING) vactst;
+VACUUM (PARALLEL 2, FREEZE) vactst;
 
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);

#45

Masahiko Sawada

sawada.mshk@gmail.com

about 8 years ago

In reply to: Masahiko Sawada (#44)

1 attachment(s)

Re: Block level parallel vacuum WIP

On Thu, Oct 5, 2017 at 10:50 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Sep 19, 2017 at 4:31 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

On Tue, Sep 19, 2017 at 3:33 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:

On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Since v4 patch conflicts with current HEAD I attached the latest version patch.

Hi Sawada-san,

Here is an interesting failure with this patch:

test rowsecurity ... FAILED
test rules ... FAILED

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Since the patch conflicts with current HEAD, I've rebased the patch
and fixed a bug. Please review it.

Attached latest version patch. I updated the followings since the
previous version.
* Support vacuum progress reporting.
* Enable to set the parallel vacuum degree per table for autovacuum (0
by default)
* Fix bugs

I've done implementation of all parts that I attempted to support, so
this patch now is no longer WIP patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachments:

parallel_vacuum_v6.patchapplication/octet-stream; name=parallel_vacuum_v6.patchDownload

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index d157958..68b75b7 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1324,6 +1324,15 @@ FROM ( { <replaceable class="parameter">numeric_literal</replaceable> | <replace
    </varlistentry>
 
    <varlistentry>
+    <term><literal>autovacuum_vacuum_parallel_workers</literal>, <literal>toast.autovacuum_multixact_freeze_max_age</literal> (<type>integer</type>)</term>
+    <listitem>
+     <para>
+      This sets the number of worker that can be used to vacuum for this table. If not set, the autovacuum performs with no workers (non-parallel).
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>autovacuum_multixact_freeze_table_age</literal>, <literal>toast.autovacuum_multixact_freeze_table_age</literal> (<type>integer</type>)</term>
     <listitem>
      <para>
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index f5bc87e..b504741 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -30,6 +30,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     FREEZE
     VERBOSE
     ANALYZE
+    PARALLEL
     DISABLE_PAGE_SKIPPING
 
 <phrase>and <replaceable class="parameter">table_and_columns</replaceable> is:</phrase>
@@ -142,6 +143,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
    </varlistentry>
 
    <varlistentry>
+    <term><literal>PARALLEL <replaceable class="PARAMETER">N</replaceable></literal></term>
+    <listitem>
+     <para>
+      Execute <command>VACUUM</command> in parallel by <replaceable class="PARAMETER">N
+      </replaceable> background workers. Collecting garbage on table is processed
+      in block-level parallel. For tables with indexes, parallel vacuum assigns each
+      index to each parallel vacuum worker and all garbages on a index are processed
+      by particular parallel vacuum worker. This option can not use with <literal>FULL</>
+      option.
+     </para>
+    </listitem>
+   </varlistentry>
+
+   <varlistentry>
     <term><literal>DISABLE_PAGE_SKIPPING</literal></term>
     <listitem>
      <para>
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index ec10762..110a41b 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -283,6 +283,14 @@ static relopt_int intRelOpts[] =
 	},
 	{
 		{
+			"autovacuum_vacuum_parallel_workers",
+			"Number of parallel processes that can be used to vacuum for this relation",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		}, -1, 0, 1024
+	},
+	{
+		{
 			"log_autovacuum_min_duration",
 			"Sets the minimum execution time above which autovacuum actions will be logged",
 			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
@@ -1342,6 +1350,8 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, multixact_freeze_max_age)},
 		{"autovacuum_multixact_freeze_table_age", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, multixact_freeze_table_age)},
+		{"autovacuum_vacuum_parallel_workers", RELOPT_TYPE_INT,
+		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, vacuum_parallel_workers)},
 		{"log_autovacuum_min_duration", RELOPT_TYPE_INT,
 		offsetof(StdRdOptions, autovacuum) + offsetof(AutoVacOpts, log_min_duration)},
 		{"autovacuum_vacuum_scale_factor", RELOPT_TYPE_REAL,
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 52dda41..113495e 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -90,8 +90,6 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
 						bool is_bitmapscan,
 						bool is_samplescan,
 						bool temp_snap);
-static void heap_parallelscan_startblock_init(HeapScanDesc scan);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
 					TransactionId xid, CommandId cid, int options);
 static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1667,7 +1665,7 @@ heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
  *		only to set the startblock once.
  * ----------------
  */
-static void
+void
 heap_parallelscan_startblock_init(HeapScanDesc scan)
 {
 	BlockNumber sync_startpage = InvalidBlockNumber;
@@ -1715,7 +1713,7 @@ retry:
  *		first backend gets an InvalidBlockNumber return.
  * ----------------
  */
-static BlockNumber
+BlockNumber
 heap_parallelscan_nextpage(HeapScanDesc scan)
 {
 	BlockNumber page;
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index d683050..7f79341 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -20,6 +20,7 @@
 #include "access/xlog.h"
 #include "catalog/namespace.h"
 #include "commands/async.h"
+#include "commands/vacuum.h"
 #include "executor/execParallel.h"
 #include "libpq/libpq.h"
 #include "libpq/pqformat.h"
@@ -122,6 +123,9 @@ static const struct
 {
 	{
 		"ParallelQueryMain", ParallelQueryMain
+	},
+	{
+		"LazyVacuumWorkerMain", LazyVacuumWorkerMain
 	}
 };
 
@@ -1125,6 +1129,9 @@ ParallelWorkerMain(Datum main_arg)
 	/* Set ParallelMasterBackendId so we know how to address temp relations. */
 	ParallelMasterBackendId = fps->parallel_master_backend_id;
 
+	/* Report pid of master process for progress information */
+	pgstat_report_leader_pid(fps->parallel_master_pid);
+
 	/*
 	 * We've initialized all of our state now; nothing should change
 	 * hereafter.
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index dc40cde..be754e8 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -883,8 +883,9 @@ CREATE VIEW pg_stat_bgwriter AS
 
 CREATE VIEW pg_stat_progress_vacuum AS
 	SELECT
-		S.pid AS pid, S.datid AS datid, D.datname AS datname,
-		S.relid AS relid,
+		S.pid,
+		S.datid,
+		S.relid,
 		CASE S.param1 WHEN 0 THEN 'initializing'
 					  WHEN 1 THEN 'scanning heap'
 					  WHEN 2 THEN 'vacuuming indexes'
@@ -893,11 +894,22 @@ CREATE VIEW pg_stat_progress_vacuum AS
 					  WHEN 5 THEN 'truncating heap'
 					  WHEN 6 THEN 'performing final cleanup'
 					  END AS phase,
-		S.param2 AS heap_blks_total, S.param3 AS heap_blks_scanned,
-		S.param4 AS heap_blks_vacuumed, S.param5 AS index_vacuum_count,
-		S.param6 AS max_dead_tuples, S.param7 AS num_dead_tuples
-    FROM pg_stat_get_progress_info('VACUUM') AS S
-		LEFT JOIN pg_database D ON S.datid = D.oid;
+		S.param2 AS heap_blks_total,
+		W.heap_blks_scanned,
+		W.heap_blks_vacuumed,
+		W.index_vacuum_count,
+		S.param6 AS max_dead_tuples,
+		W.num_dead_tuples
+	FROM pg_stat_get_progress_info('VACUUM') AS S,
+		(SELECT leader_pid,
+			max(param3) AS heap_blks_scanned,
+			max(param4) AS heap_blks_vacuumed,
+			max(param5) AS index_vacuum_count,
+			max(param7) AS num_dead_tuples
+		FROM pg_stat_get_progress_info('VACUUM')
+		GROUP BY leader_pid) AS W
+	WHERE
+		S.pid = W.leader_pid;
 
 CREATE VIEW pg_user_mappings AS
     SELECT
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index cbd6e9b..1484583 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -74,7 +74,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
 				  MultiXactId minMulti,
 				  TransactionId lastSaneFrozenXid,
 				  MultiXactId lastSaneMinMulti);
-static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
+static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options,
 		   VacuumParams *params);
 
 /*
@@ -89,15 +89,15 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	VacuumParams params;
 
 	/* sanity checks on options */
-	Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
-	Assert((vacstmt->options & VACOPT_VACUUM) ||
-		   !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
-	Assert(!(vacstmt->options & VACOPT_SKIPTOAST));
+	Assert(vacstmt->options.flags & (VACOPT_VACUUM | VACOPT_ANALYZE));
+	Assert((vacstmt->options.flags & VACOPT_VACUUM) ||
+		   !(vacstmt->options.flags & (VACOPT_FULL | VACOPT_FREEZE)));
+	Assert(!(vacstmt->options.flags & VACOPT_SKIPTOAST));
 
 	/*
 	 * Make sure VACOPT_ANALYZE is specified if any column lists are present.
 	 */
-	if (!(vacstmt->options & VACOPT_ANALYZE))
+	if (!(vacstmt->options.flags & VACOPT_ANALYZE))
 	{
 		ListCell   *lc;
 
@@ -116,7 +116,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
 	 * All freeze ages are zero if the FREEZE option is given; otherwise pass
 	 * them as -1 which means to use the default values.
 	 */
-	if (vacstmt->options & VACOPT_FREEZE)
+	if (vacstmt->options.flags & VACOPT_FREEZE)
 	{
 		params.freeze_min_age = 0;
 		params.freeze_table_age = 0;
@@ -163,7 +163,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
  * memory context that will not disappear at transaction commit.
  */
 void
-vacuum(int options, List *relations, VacuumParams *params,
+vacuum(VacuumOptions options, List *relations, VacuumParams *params,
 	   BufferAccessStrategy bstrategy, bool isTopLevel)
 {
 	static bool in_vacuum = false;
@@ -174,7 +174,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 
 	Assert(params != NULL);
 
-	stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
+	stmttype = (options.flags & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
 
 	/*
 	 * We cannot run VACUUM inside a user transaction block; if we were inside
@@ -184,7 +184,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 	 *
 	 * ANALYZE (without VACUUM) can run either way.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 	{
 		PreventTransactionChain(isTopLevel, stmttype);
 		in_outer_xact = false;
@@ -206,17 +206,26 @@ vacuum(int options, List *relations, VacuumParams *params,
 	/*
 	 * Sanity check DISABLE_PAGE_SKIPPING option.
 	 */
-	if ((options & VACOPT_FULL) != 0 &&
-		(options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
 
 	/*
+	 * Sanity check PARALLEL option.
+	 */
+	if ((options.flags & VACOPT_FULL) != 0 &&
+		(options.flags & VACOPT_PARALLEL) != 0)
+		ereport(ERROR,
+				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+				 errmsg("VACUUM option PARALLEL cannnot be used with FULL")));
+
+	/*
 	 * Send info about dead objects to the statistics collector, unless we are
 	 * in autovacuum --- autovacuum.c does this for itself.
 	 */
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 		pgstat_vacuum_stat();
 
 	/*
@@ -281,11 +290,11 @@ vacuum(int options, List *relations, VacuumParams *params,
 	 * transaction block, and also in an autovacuum worker, use own
 	 * transactions so we can release locks sooner.
 	 */
-	if (options & VACOPT_VACUUM)
+	if (options.flags & VACOPT_VACUUM)
 		use_own_xacts = true;
 	else
 	{
-		Assert(options & VACOPT_ANALYZE);
+		Assert(options.flags & VACOPT_ANALYZE);
 		if (IsAutoVacuumWorkerProcess())
 			use_own_xacts = true;
 		else if (in_outer_xact)
@@ -335,13 +344,13 @@ vacuum(int options, List *relations, VacuumParams *params,
 		{
 			VacuumRelation *vrel = lfirst_node(VacuumRelation, cur);
 
-			if (options & VACOPT_VACUUM)
+			if (options.flags & VACOPT_VACUUM)
 			{
 				if (!vacuum_rel(vrel->oid, vrel->relation, options, params))
 					continue;
 			}
 
-			if (options & VACOPT_ANALYZE)
+			if (options.flags & VACOPT_ANALYZE)
 			{
 				/*
 				 * If using separate xacts, start one for analyze. Otherwise,
@@ -354,7 +363,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 					PushActiveSnapshot(GetTransactionSnapshot());
 				}
 
-				analyze_rel(vrel->oid, vrel->relation, options, params,
+				analyze_rel(vrel->oid, vrel->relation, options.flags, params,
 							vrel->va_cols, in_outer_xact, vac_strategy);
 
 				if (use_own_xacts)
@@ -390,7 +399,7 @@ vacuum(int options, List *relations, VacuumParams *params,
 		StartTransactionCommand();
 	}
 
-	if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+	if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
 	{
 		/*
 		 * Update pg_database.datfrozenxid, and truncate pg_xact if possible.
@@ -1321,7 +1330,7 @@ vac_truncate_clog(TransactionId frozenXID,
  *		At entry and exit, we are not inside a transaction.
  */
 static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options, VacuumParams *params)
 {
 	LOCKMODE	lmode;
 	Relation	onerel;
@@ -1342,7 +1351,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 */
 	PushActiveSnapshot(GetTransactionSnapshot());
 
-	if (!(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_FULL))
 	{
 		/*
 		 * In lazy vacuum, we can set the PROC_IN_VACUUM flag, which lets
@@ -1382,7 +1391,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
 	 * way, we can be sure that no other backend is vacuuming the same table.
 	 */
-	lmode = (options & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+	lmode = (options.flags & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
 
 	/*
 	 * Open the relation and get the appropriate lock on it.
@@ -1393,7 +1402,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * If we've been asked not to wait for the relation lock, acquire it first
 	 * in non-blocking mode, before calling try_relation_open().
 	 */
-	if (!(options & VACOPT_NOWAIT))
+	if (!(options.flags & VACOPT_NOWAIT))
 		onerel = try_relation_open(relid, lmode);
 	else if (ConditionalLockRelationOid(relid, lmode))
 		onerel = try_relation_open(relid, NoLock);
@@ -1510,7 +1519,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	 * us to process it.  In VACUUM FULL, though, the toast table is
 	 * automatically rebuilt by cluster_rel so we shouldn't recurse to it.
 	 */
-	if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
+	if (!(options.flags & VACOPT_SKIPTOAST) && !(options.flags & VACOPT_FULL))
 		toast_relid = onerel->rd_rel->reltoastrelid;
 	else
 		toast_relid = InvalidOid;
@@ -1529,7 +1538,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 	/*
 	 * Do the actual work --- either FULL or "lazy" vacuum
 	 */
-	if (options & VACOPT_FULL)
+	if (options.flags & VACOPT_FULL)
 	{
 		/* close relation before vacuuming, but hold lock until commit */
 		relation_close(onerel, NoLock);
@@ -1537,7 +1546,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
 
 		/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
 		cluster_rel(relid, InvalidOid, false,
-					(options & VACOPT_VERBOSE) != 0);
+					(options.flags & VACOPT_VERBOSE) != 0);
 	}
 	else
 		lazy_vacuum_rel(onerel, options, params, vac_strategy);
@@ -1591,8 +1600,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
  * hit dangling index pointers.
  */
 void
-vac_open_indexes(Relation relation, LOCKMODE lockmode,
-				 int *nindexes, Relation **Irel)
+vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
 {
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 30b1c08..9b5da1f 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -22,6 +22,20 @@
  * of index scans performed.  So we don't use maintenance_work_mem memory for
  * the TID array, just enough to hold as many heap tuples as fit on one page.
  *
+ * In PostgreSQL 11, we support a parallel option for lazy vacuum. In parallel
+ * lazy vacuum, multiple vacuum worker processes get blocks in parallel using
+ * parallel heap scan and process them. If a table with indexes the parallel
+ * vacuum workers vacuum the heap and indexes in parallel.  Also, since dead
+ * tuple TIDs is shared with all vacuum processes including the leader process
+ * the parallel vacuum processes have to make two synchronization points in
+ * lazy vacuum processing: when before starting vacuum and when before clearing
+ * dead tuple TIDs. In these two points the leader treats dead tuple TIDs as
+ * an arbiter. The information required by parallel lazy vacuum such as the
+ * statistics of table, parallel heap scan description have to be shared with
+ * all vacuum processes, and table statistics are funneled by the leader
+ * process after finished. Note that dead tuple TIDs need to be shared only
+ * when the table has indexes. For table with no indexes, each parallel worker
+ * processes blocks and vacuum them independently.
  *
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -41,8 +55,10 @@
 #include "access/heapam_xlog.h"
 #include "access/htup_details.h"
 #include "access/multixact.h"
+#include "access/relscan.h"
 #include "access/transam.h"
 #include "access/visibilitymap.h"
+#include "access/xact.h"
 #include "access/xlog.h"
 #include "catalog/catalog.h"
 #include "catalog/storage.h"
@@ -54,6 +70,7 @@
 #include "portability/instr_time.h"
 #include "postmaster/autovacuum.h"
 #include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
 #include "storage/freespace.h"
 #include "storage/lmgr.h"
 #include "utils/lsyscache.h"
@@ -103,10 +120,81 @@
  */
 #define PREFETCH_SIZE			((BlockNumber) 32)
 
+/* DSM key for parallel lazy vacuum */
+#define VACUUM_KEY_PARALLEL_SCAN	UINT64CONST(0xFFFFFFFFFFF00001)
+#define VACUUM_KEY_VACUUM_STATS		UINT64CONST(0xFFFFFFFFFFF00002)
+#define VACUUM_KEY_INDEX_STATS	    UINT64CONST(0xFFFFFFFFFFF00003)
+#define VACUUM_KEY_DEAD_TUPLE_CTL	UINT64CONST(0xFFFFFFFFFFF00004)
+#define VACUUM_KEY_DEAD_TUPLES		UINT64CONST(0xFFFFFFFFFFF00005)
+#define VACUUM_KEY_PARALLEL_STATE	UINT64CONST(0xFFFFFFFFFFF00006)
+
+/*
+ * see note of lazy_scan_heap_get_nextpage about forcing scanning of
+ * last page
+ */
+#define FORCE_CHECK_PAGE(blk) \
+	(blkno == (blk - 1) && should_attempt_truncation(vacrelstats))
+
+/* Check if given index is assigned to this parallel vacuum worker */
+#define IsAssignedIndex(i, pstate) \
+	(pstate == NULL || \
+	 (((i) % ((LVParallelState *) (pstate))->nworkers -1 ) == ParallelWorkerNumber))
+
+#define IsDeadTupleShared(lvstate) \
+	(((LVState *)(lvstate))->parallel_mode && ((LVState *)(lvstate))->vacrelstats->nindexes > 0)
+
+/* Vacuum worker state for parallel lazy vacuum */
+#define VACSTATE_SCAN			0x1	/* heap scan phase */
+#define VACSTATE_VACUUM			0x2	/* vacuuming on table and index */
+
+/*
+ * Vacuum relevant options and thresholds that we need share with parallel
+ * vacuum workers.
+ */
+typedef struct VacuumInfo
+{
+	int				options;	/* VACUUM options */
+	bool			aggressive;	/* does each worker need to aggressive vacuum? */
+	TransactionId	oldestxmin;
+	TransactionId	freezelimit;
+	MultiXactId		multixactcutoff;
+	int				elevel;
+} VacuumInfo;
+
+/* Struct for index statistics that are used for parallel lazy vacuum */
+typedef struct LVIndStats
+{
+	bool		updated;	/* need to be updated? */
+	BlockNumber	num_pages;
+	BlockNumber	num_tuples;
+} LVIndStats;
+
+/* Struct for parallel lazy vacuum state */
+typedef struct LVParallelState
+{
+	int nworkers;			/* # of process doing vacuum */
+	VacuumInfo	info;		/* vacuum relevant options to share */
+	int	state;				/* current parallel vacuum status */
+	int	finish_count;		/* the number of workers finished current state */
+	ConditionVariable cv;
+	slock_t	mutex;			/* protect above fields */
+} LVParallelState;
+
+/* Struct for control dead tuple TIDs array */
+typedef struct LVDeadTupleCtl
+{
+	int			dt_max;	/* # slots allocated in array */
+	int 		dt_count; /* # of dead tuple */
+
+	/* Used only for parallel lazy vacuum */
+	int			dt_index;	/* current index of dead tuple array used
+							   in lazy_vacuum_heap */
+	slock_t 	mutex;	/* protect above fields */
+} LVDeadTupleCtl;
+
 typedef struct LVRelStats
 {
-	/* hasindex = true means two-pass strategy; false means one-pass */
-	bool		hasindex;
+	int			nindexes; /* > 0 means two-pass strategy; = 0 means one-pass */
 	/* Overall statistics about rel */
 	BlockNumber old_rel_pages;	/* previous value of pg_class.relpages */
 	BlockNumber rel_pages;		/* total number of pages */
@@ -118,19 +206,46 @@ typedef struct LVRelStats
 	double		old_rel_tuples; /* previous value of pg_class.reltuples */
 	double		new_rel_tuples; /* new estimated total # of tuples */
 	double		new_dead_tuples;	/* new estimated total # of dead tuples */
-	BlockNumber pages_removed;
 	double		tuples_deleted;
-	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
-	/* List of TIDs of tuples we intend to delete */
-	/* NB: this list is ordered by TID address */
-	int			num_dead_tuples;	/* current # of entries */
-	int			max_dead_tuples;	/* # slots allocated in array */
-	ItemPointer dead_tuples;	/* array of ItemPointerData */
 	int			num_index_scans;
+	BlockNumber pages_removed;
+	BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
 	TransactionId latestRemovedXid;
 	bool		lock_waiter_detected;
 } LVRelStats;
 
+/* Struct for lazy vacuum execution */
+typedef struct LVState
+{
+	bool		parallel_mode;
+	LVRelStats *vacrelstats;
+	/*
+	 * Used when both parallel and non-parallel lazy vacuum, but in parallel
+	 * lazy vacuum and table with index, dtctl points to a dynamic shared memory
+	 * and controlled by dtctl struct.
+	 */
+	LVDeadTupleCtl	*dtctl;
+	ItemPointer	deadtuples;
+
+	/* Used only for parallel lazy vacuum */
+	ParallelContext *pcxt;
+	LVParallelState *pstate;
+	ParallelHeapScanDesc pscan;
+	LVIndStats *indstats;
+} LVState;
+
+/*
+ * Scan description data for lazy vacuum. In parallel lazy vacuum,
+ * we use only heapscan instead.
+ */
+typedef struct LVScanDescData
+{
+	BlockNumber lv_cblock;					/* current scanning block number */
+	BlockNumber lv_next_unskippable_block;	/* next block number we cannot skip */
+	BlockNumber lv_nblocks;					/* the number blocks of relation */
+	HeapScanDesc heapscan;					/* field for parallel lazy vacuum */
+} LVScanDescData;
+typedef struct LVScanDescData *LVScanDesc;
 
 /* A few variables that don't seem worth passing around as parameters */
 static int	elevel = -1;
@@ -141,32 +256,47 @@ static MultiXactId MultiXactCutoff;
 
 static BufferAccessStrategy vac_strategy;
 
-
-/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
-			   LVRelStats *vacrelstats, Relation *Irel, int nindexes,
-			   bool aggressive);
-static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
+/* nonf-export function prototypes */
+static void lazy_vacuum_heap(Relation onerel, LVState *lvstate);
 static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
 static void lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats);
-static void lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats);
+							  LVState *lvstate);
+static void lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat);
 static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer);
 static bool should_attempt_truncation(LVRelStats *vacrelstats);
 static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
 static BlockNumber count_nondeletable_pages(Relation onerel,
 						 LVRelStats *vacrelstats);
-static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
-static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr);
+static void lazy_space_alloc(LVState *lvstate, BlockNumber relblocks);
+static void lazy_record_dead_tuple(LVState *state, ItemPointer itemptr);
 static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
 static int	vac_cmp_itemptr(const void *left, const void *right);
 static bool heap_page_is_all_visible(Relation rel, Buffer buf,
 						 TransactionId *visibility_cutoff_xid, bool *all_frozen);
+static void do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irels,
+							  int nindexes, int options, bool aggressive);
+static void lazy_scan_heap(Relation rel, LVState *lvstate, VacuumOptions options,
+						   bool aggressive);
+
+/* function prototypes for parallel vacuum */
+static void lazy_gather_vacuum_stats(ParallelContext *pxct,
+									 LVRelStats *valrelstats);
+static void lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats);
+static void lazy_initialize_dsm(ParallelContext *pcxt, Relation onrel,
+								LVState *lvstate, int options, bool aggressive);
+static LVState *lazy_initialize_worker(shm_toc *toc);
+static LVScanDesc lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan);
+static void lv_endscan(LVScanDesc lvscan);
+static BlockNumber lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+											   LVScanDesc lvscan,
+											   bool *all_visible_according_to_vm,
+											   Buffer *vmbuffer, int options, bool aggressive);
+static void lazy_prepare_vacuum(LVState *lvstate);
+static void lazy_end_vacuum(LVState *lvstate);
+static long lazy_get_max_dead_tuples(LVRelStats *vacrelstats);
 
 
 /*
@@ -179,12 +309,11 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf,
  *		and locked the relation.
  */
 void
-lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+lazy_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
 				BufferAccessStrategy bstrategy)
 {
-	LVRelStats *vacrelstats;
-	Relation   *Irel;
-	int			nindexes;
+	LVState		*lvstate;
+	LVRelStats	*vacrelstats;
 	PGRUsage	ru0;
 	TimestampTz starttime = 0;
 	long		secs;
@@ -211,7 +340,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 		starttime = GetCurrentTimestamp();
 	}
 
-	if (options & VACOPT_VERBOSE)
+	if (options.flags & VACOPT_VERBOSE)
 		elevel = INFO;
 	else
 		elevel = DEBUG2;
@@ -239,10 +368,12 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 											   xidFullScanLimit);
 	aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
 											  mxactFullScanLimit);
-	if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+	if (options.flags & VACOPT_DISABLE_PAGE_SKIPPING)
 		aggressive = true;
 
+	lvstate = (LVState *) palloc0(sizeof(LVState));
 	vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
+	lvstate->vacrelstats = vacrelstats;
 
 	vacrelstats->old_rel_pages = onerel->rd_rel->relpages;
 	vacrelstats->old_rel_tuples = onerel->rd_rel->reltuples;
@@ -250,15 +381,8 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	vacrelstats->pages_removed = 0;
 	vacrelstats->lock_waiter_detected = false;
 
-	/* Open all indexes of the relation */
-	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
-	vacrelstats->hasindex = (nindexes > 0);
-
 	/* Do the vacuuming */
-	lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
-
-	/* Done with indexes */
-	vac_close_indexes(nindexes, Irel, NoLock);
+	lazy_scan_heap(onerel, lvstate, options, aggressive);
 
 	/*
 	 * Compute whether we actually scanned the all unfrozen pages. If we did,
@@ -267,7 +391,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 	 * NB: We need to check this before truncating the relation, because that
 	 * will change ->rel_pages.
 	 */
-	if ((vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
+	if ((lvstate->vacrelstats->scanned_pages + vacrelstats->frozenskipped_pages)
 		< vacrelstats->rel_pages)
 	{
 		Assert(!aggressive);
@@ -329,7 +453,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
 						new_rel_pages,
 						new_rel_tuples,
 						new_rel_allvisible,
-						vacrelstats->hasindex,
+						(vacrelstats->nindexes != 0),
 						new_frozen_xid,
 						new_min_multi,
 						false);
@@ -439,28 +563,180 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
 }
 
 /*
- *	lazy_scan_heap() -- scan an open heap relation
+ * If the number of workers is specified more than 0, we enter the parallel lazy
+ * vacuum mode. In parallel lazy vacuum mode, we initialize a dynamic shared memory
+ * and launch parallel vacuum workers. The launcher process also vacuums the table
+ * after launched and then waits for the all vacuum workers to finish. After all vacuum
+ * workers finished we gather the vacuum statistics of table and indexes, and update
+ * them.
+ */
+static void
+lazy_scan_heap(Relation onerel, LVState *lvstate, VacuumOptions options,
+			   bool aggressive)
+{
+	ParallelContext	*pcxt;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+	Relation	*Irel;
+	int			nindexes;
+
+	lvstate->parallel_mode = options.nworkers > 0;
+
+	/* Open indexes */
+	vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
+	vacrelstats->nindexes = nindexes;
+
+	if (lvstate->parallel_mode)
+	{
+		EnterParallelMode();
+
+		/* Create parallel context and initialize it */
+		pcxt = CreateParallelContext("postgres", "LazyVacuumWorkerMain",
+									 options.nworkers);
+		lvstate->pcxt = pcxt;
+
+		/* Estimate DSM size for parallel vacuum */
+		lazy_estimate_dsm(pcxt, lvstate->vacrelstats);
+
+		/* Initialize DSM for parallel vacuum */
+		InitializeParallelDSM(pcxt);
+		lazy_initialize_dsm(pcxt, onerel, lvstate, options.flags, aggressive);
+
+		/* Set master pid to itself */
+		pgstat_report_leader_pid(MyProcPid);
+
+		/* Launch workers */
+		LaunchParallelWorkers(pcxt);
+	}
+
+	do_lazy_scan_heap(lvstate, onerel, Irel, nindexes, options.flags, aggressive);
+
+	/*
+	 * We can update relation statistics such as scanned page after gathered
+	 * statistics from all workers. Also, in parallel mode since we cannot update
+	 * index statistics at the same time the leader process have to do it.
+	 *
+	 * XXX : If we allows workers to update statistics tuples at the same time
+	 * the updating index statistics can be done in lazy_cleanup_index().
+	 */
+	if (lvstate->parallel_mode)
+	{
+		int i;
+		LVIndStats *indstats = palloc(sizeof(LVIndStats) * lvstate->vacrelstats->nindexes);
+
+		/* Wait for workers finished vacuum */
+		WaitForParallelWorkersToFinish(pcxt);
+
+		/* Gather the result of vacuum statistics from all workers */
+		lazy_gather_vacuum_stats(pcxt, vacrelstats);
+
+		/* Now we can compute the new value for pg_class.reltuples */
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 vacrelstats->rel_pages,
+															 vacrelstats->scanned_pages,
+															 vacrelstats->scanned_tuples);
+
+		/* Copy new index stats to local memory */
+		memcpy(indstats, lvstate->indstats, sizeof(LVIndStats) * vacrelstats->nindexes);
+
+		DestroyParallelContext(pcxt);
+		ExitParallelMode();
+
+		/* After exit parallel mode, update index statistics */
+		for (i = 0; i < vacrelstats->nindexes; i++)
+		{
+			Relation	ind = Irel[i];
+			LVIndStats *indstat = (LVIndStats *) &(indstats[i]);
+
+			if (indstat->updated)
+			   vac_update_relstats(ind,
+								   indstat->num_pages,
+								   indstat->num_tuples,
+								   0,
+								   false,
+								   InvalidTransactionId,
+								   InvalidMultiXactId,
+								   false);
+		}
+
+		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
+									 vacrelstats->scanned_pages);
+		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED,
+									 vacrelstats->rel_pages);
+		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_INDEX_VACUUMS,
+									 vacrelstats->num_index_scans);
+	}
+
+	vac_close_indexes(nindexes, Irel, RowExclusiveLock);
+}
+
+/*
+ * Entry point of parallel vacuum worker.
+ */
+void
+LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc)
+{
+	LVState		*lvstate;
+	Relation rel;
+	Relation *indrel;
+	int nindexes_worker;
+
+	/* Look up dynamic shared memory and initialize */
+	lvstate = lazy_initialize_worker(toc);
+
+	Assert(lvstate != NULL);
+
+	rel = relation_open(lvstate->pscan->phs_relid, ShareUpdateExclusiveLock);
+
+	/* Open all indexes */
+	vac_open_indexes(rel, RowExclusiveLock, &nindexes_worker,
+					 &indrel);
+
+	pgstat_progress_start_command(PROGRESS_COMMAND_VACUUM,
+								  RelationGetRelid(rel));
+	/* Do lazy vacuum */
+	do_lazy_scan_heap(lvstate, rel, indrel, lvstate->vacrelstats->nindexes,
+					  lvstate->pstate->info.options, lvstate->pstate->info.aggressive);
+
+	pgstat_progress_end_command();
+
+	vac_close_indexes(lvstate->vacrelstats->nindexes, indrel, RowExclusiveLock);
+	heap_close(rel, ShareUpdateExclusiveLock);
+}
+
+/*
+ *	do_lazy_scan_heap() -- scan an open heap relation
  *
  *		This routine prunes each page in the heap, which will among other
  *		things truncate dead tuples to dead line pointers, defragment the
- *		page, and set commit status bits (see heap_page_prune).  It also builds
+ *		page, and set commit status bits (see heap_page_prune).  It also uses
  *		lists of dead tuples and pages with free space, calculates statistics
  *		on the number of live tuples in the heap, and marks pages as
  *		all-visible if appropriate.  When done, or when we run low on space for
- *		dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
- *		to reclaim dead line pointers.
+ *		dead-tuple TIDs, invoke vacuuming of assigned indexes and call lazy_vacuum_heap
+ *		to reclaim dead line pointers. In parallel vacuum, we need to synchronize
+ *		at where scanning heap finished and vacuuming heap finished. The vacuum
+ *		worker reached to that point first need to wait for other vacuum workers
+ *		reached to the same point.
+ *
+ *		In parallel lazy scan, we get next page number using parallel heap scan.
+ *		Since the dead tuple TIDs are shared with all vacuum workers, we have to
+ *		wait for all other workers to reach to the same points where before starting
+ *		reclaiming dead tuple TIDs and before clearing dead tuple TIDs information
+ *		in dynamic shared memory.
  *
  *		If there are no indexes then we can reclaim line pointers on the fly;
  *		dead line pointers need only be retained until all index pointers that
  *		reference them have been killed.
  */
 static void
-lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
-			   Relation *Irel, int nindexes, bool aggressive)
+do_lazy_scan_heap(LVState *lvstate, Relation onerel, Relation *Irel,
+				  int nindexes, int options, bool aggressive)
 {
-	BlockNumber nblocks,
-				blkno;
+	LVRelStats *vacrelstats = lvstate->vacrelstats;
+	BlockNumber blkno;
+	BlockNumber nblocks;
 	HeapTupleData tuple;
+	LVScanDesc lvscan;
 	char	   *relname;
 	BlockNumber empty_pages,
 				vacuumed_pages;
@@ -468,14 +744,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 				tups_vacuumed,
 				nkeep,
 				nunused;
+	int			dt_vacuum_threshold;
 	IndexBulkDeleteResult **indstats;
 	int			i;
 	PGRUsage	ru0;
 	Buffer		vmbuffer = InvalidBuffer;
-	BlockNumber next_unskippable_block;
-	bool		skipping_blocks;
 	xl_heap_freeze_tuple *frozen;
 	StringInfoData buf;
+	bool		all_visible_according_to_vm = false;
+
 	const int	initprog_index[] = {
 		PROGRESS_VACUUM_PHASE,
 		PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -504,89 +781,27 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->nonempty_pages = 0;
 	vacrelstats->latestRemovedXid = InvalidTransactionId;
 
-	lazy_space_alloc(vacrelstats, nblocks);
+	lazy_space_alloc(lvstate, nblocks);
 	frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
 
+	/* Begin heap scan for vacuum */
+	lvscan = lv_beginscan(onerel, lvstate->pscan);
+
 	/* Report that we're scanning the heap, advertising total # of blocks */
 	initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
 	initprog_val[1] = nblocks;
-	initprog_val[2] = vacrelstats->max_dead_tuples;
+	initprog_val[2] = lvstate->dtctl->dt_max;
 	pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
 
-	/*
-	 * Except when aggressive is set, we want to skip pages that are
-	 * all-visible according to the visibility map, but only when we can skip
-	 * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
-	 * sequentially, the OS should be doing readahead for us, so there's no
-	 * gain in skipping a page now and then; that's likely to disable
-	 * readahead and so be counterproductive. Also, skipping even a single
-	 * page means that we can't update relfrozenxid, so we only want to do it
-	 * if we can skip a goodly number of pages.
-	 *
-	 * When aggressive is set, we can't skip pages just because they are
-	 * all-visible, but we can still skip pages that are all-frozen, since
-	 * such pages do not need freezing and do not affect the value that we can
-	 * safely set for relfrozenxid or relminmxid.
-	 *
-	 * Before entering the main loop, establish the invariant that
-	 * next_unskippable_block is the next block number >= blkno that we can't
-	 * skip based on the visibility map, either all-visible for a regular scan
-	 * or all-frozen for an aggressive scan.  We set it to nblocks if there's
-	 * no such block.  We also set up the skipping_blocks flag correctly at
-	 * this stage.
-	 *
-	 * Note: The value returned by visibilitymap_get_status could be slightly
-	 * out-of-date, since we make this test before reading the corresponding
-	 * heap page or locking the buffer.  This is OK.  If we mistakenly think
-	 * that the page is all-visible or all-frozen when in fact the flag's just
-	 * been cleared, we might fail to vacuum the page.  It's easy to see that
-	 * skipping a page when aggressive is not set is not a very big deal; we
-	 * might leave some dead tuples lying around, but the next vacuum will
-	 * find them.  But even when aggressive *is* set, it's still OK if we miss
-	 * a page whose all-frozen marking has just been cleared.  Any new XIDs
-	 * just added to that page are necessarily newer than the GlobalXmin we
-	 * computed, so they'll have no effect on the value to which we can safely
-	 * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
-	 *
-	 * We will scan the table's last page, at least to the extent of
-	 * determining whether it has tuples or not, even if it should be skipped
-	 * according to the above rules; except when we've already determined that
-	 * it's not worth trying to truncate the table.  This avoids having
-	 * lazy_truncate_heap() take access-exclusive lock on the table to attempt
-	 * a truncation that just fails immediately because there are tuples in
-	 * the last page.  This is worth avoiding mainly because such a lock must
-	 * be replayed on any hot standby, where it can be disruptive.
-	 */
-	next_unskippable_block = 0;
-	if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-	{
-		while (next_unskippable_block < nblocks)
-		{
-			uint8		vmstatus;
-
-			vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
-												&vmbuffer);
-			if (aggressive)
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
-					break;
-			}
-			else
-			{
-				if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
-					break;
-			}
-			vacuum_delay_point();
-			next_unskippable_block++;
-		}
-	}
-
-	if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
-		skipping_blocks = true;
+	if (lvstate->parallel_mode)
+		dt_vacuum_threshold = MaxHeapTuplesPerPage *
+			(lvstate->pstate->nworkers + 1);
 	else
-		skipping_blocks = false;
+		dt_vacuum_threshold = MaxHeapTuplesPerPage;
 
-	for (blkno = 0; blkno < nblocks; blkno++)
+	while((blkno = lazy_scan_get_nextpage(onerel, lvstate, lvscan,
+										  &all_visible_according_to_vm,
+										  &vmbuffer, options, aggressive)) != InvalidBlockNumber)
 	{
 		Buffer		buf;
 		Page		page;
@@ -597,99 +812,35 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		int			prev_dead_count;
 		int			nfrozen;
 		Size		freespace;
-		bool		all_visible_according_to_vm = false;
 		bool		all_visible;
 		bool		all_frozen = true;	/* provided all_visible is also true */
 		bool		has_dead_tuples;
 		TransactionId visibility_cutoff_xid = InvalidTransactionId;
+		int			dtmax;
+		int			dtcount;
 
-		/* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
-		(blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
-
+		elog(NOTICE, "pid %d blkno %d", MyProcPid, blkno);
 		pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
 
-		if (blkno == next_unskippable_block)
-		{
-			/* Time to advance next_unskippable_block */
-			next_unskippable_block++;
-			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
-			{
-				while (next_unskippable_block < nblocks)
-				{
-					uint8		vmskipflags;
-
-					vmskipflags = visibilitymap_get_status(onerel,
-														   next_unskippable_block,
-														   &vmbuffer);
-					if (aggressive)
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
-							break;
-					}
-					else
-					{
-						if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
-							break;
-					}
-					vacuum_delay_point();
-					next_unskippable_block++;
-				}
-			}
-
-			/*
-			 * We know we can't skip the current block.  But set up
-			 * skipping_blocks to do the right thing at the following blocks.
-			 */
-			if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
-				skipping_blocks = true;
-			else
-				skipping_blocks = false;
-
-			/*
-			 * Normally, the fact that we can't skip this block must mean that
-			 * it's not all-visible.  But in an aggressive vacuum we know only
-			 * that it's not all-frozen, so it might still be all-visible.
-			 */
-			if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
-				all_visible_according_to_vm = true;
-		}
-		else
-		{
-			/*
-			 * The current block is potentially skippable; if we've seen a
-			 * long enough run of skippable blocks to justify skipping it, and
-			 * we're not forced to check it, then go ahead and skip.
-			 * Otherwise, the page must be at least all-visible if not
-			 * all-frozen, so we can set all_visible_according_to_vm = true.
-			 */
-			if (skipping_blocks && !FORCE_CHECK_PAGE())
-			{
-				/*
-				 * Tricky, tricky.  If this is in aggressive vacuum, the page
-				 * must have been all-frozen at the time we checked whether it
-				 * was skippable, but it might not be any more.  We must be
-				 * careful to count it as a skipped all-frozen page in that
-				 * case, or else we'll think we can't update relfrozenxid and
-				 * relminmxid.  If it's not an aggressive vacuum, we don't
-				 * know whether it was all-frozen, so we have to recheck; but
-				 * in this case an approximate answer is OK.
-				 */
-				if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
-					vacrelstats->frozenskipped_pages++;
-				continue;
-			}
-			all_visible_according_to_vm = true;
-		}
-
 		vacuum_delay_point();
 
 		/*
 		 * If we are close to overrunning the available space for dead-tuple
 		 * TIDs, pause and do a cycle of vacuuming before we tackle this page.
+		 * We don't need to acquire lock because dt_max should not be changed
+		 * while running vacuum.
 		 */
-		if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
-			vacrelstats->num_dead_tuples > 0)
+		if (IsDeadTupleShared(lvstate))
+		{
+			SpinLockAcquire(&lvstate->dtctl->mutex);
+			dtcount = lvstate->dtctl->dt_count;
+			SpinLockRelease(&lvstate->dtctl->mutex);
+		}
+		else
+			dtcount = lvstate->dtctl->dt_count;
+
+		dtmax = lvstate->dtctl->dt_max;
+		if (((dtmax - dtcount) < dt_vacuum_threshold) &&	dtcount > 0)
 		{
 			const int	hvp_index[] = {
 				PROGRESS_VACUUM_PHASE,
@@ -698,6 +849,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			int64		hvp_val[2];
 
 			/*
+			 * Here we're about to vacuum the table and indexes actually. Before
+			 * entering vacuum state, we have to wait for other vacuum worker to
+			 * reach here.
+			 */
+			lazy_prepare_vacuum(lvstate);
+
+			/*
 			 * Before beginning index vacuuming, we release any pin we may
 			 * hold on the visibility map page.  This isn't necessary for
 			 * correctness, but we do it anyway to avoid holding the pin
@@ -716,11 +874,12 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
 
-			/* Remove index entries */
+			/* Remove assigned index entries */
 			for (i = 0; i < nindexes; i++)
-				lazy_vacuum_index(Irel[i],
-								  &indstats[i],
-								  vacrelstats);
+			{
+				if (IsAssignedIndex(i, lvstate->pstate))
+					lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+			}
 
 			/*
 			 * Report that we are now vacuuming the heap.  We also increase
@@ -733,19 +892,22 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			pgstat_progress_update_multi_param(2, hvp_index, hvp_val);
 
 			/* Remove tuples from heap */
-			lazy_vacuum_heap(onerel, vacrelstats);
+			lazy_vacuum_heap(onerel, lvstate);
 
 			/*
-			 * Forget the now-vacuumed tuples, and press on, but be careful
-			 * not to reset latestRemovedXid since we want that value to be
-			 * valid.
+			 * Here we've done vacuum on the heap and index and we are going
+			 * to begin the next round scan on heap. Wait until all vacuum worker
+			 * finished vacuum. After all vacuum workers finished, forget the
+			 * now-vacuumed tuples, and press on, but be careful not to reset
+			 * latestRemoveXid since we want that value to be valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
-			vacrelstats->num_index_scans++;
+			lazy_end_vacuum(lvstate);
 
 			/* Report that we are once again scanning the heap */
 			pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 										 PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
+			vacrelstats->num_index_scans++;
 		}
 
 		/*
@@ -771,7 +933,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * it's OK to skip vacuuming pages we get a lock conflict on. They
 			 * will be dealt with in some future vacuum.
 			 */
-			if (!aggressive && !FORCE_CHECK_PAGE())
+			if (!aggressive && !FORCE_CHECK_PAGE(blkno))
 			{
 				ReleaseBuffer(buf);
 				vacrelstats->pinskipped_pages++;
@@ -923,7 +1085,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		has_dead_tuples = false;
 		nfrozen = 0;
 		hastup = false;
-		prev_dead_count = vacrelstats->num_dead_tuples;
+		prev_dead_count = lvstate->dtctl->dt_count;
 		maxoff = PageGetMaxOffsetNumber(page);
 
 		/*
@@ -962,7 +1124,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 */
 			if (ItemIdIsDead(itemid))
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				all_visible = false;
 				continue;
 			}
@@ -1067,7 +1229,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 			if (tupgone)
 			{
-				lazy_record_dead_tuple(vacrelstats, &(tuple.t_self));
+				lazy_record_dead_tuple(lvstate, &(tuple.t_self));
 				HeapTupleHeaderAdvanceLatestRemovedXid(tuple.t_data,
 													   &vacrelstats->latestRemovedXid);
 				tups_vacuumed += 1;
@@ -1132,13 +1294,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/*
 		 * If there are no indexes then we can vacuum the page right now
-		 * instead of doing a second scan.
+		 * instead of doing a second scan. Because each parallel worker uses its
+		 * own dead tuple area they can vacuum independently.
 		 */
-		if (nindexes == 0 &&
-			vacrelstats->num_dead_tuples > 0)
+		if (Irel == NULL && lvstate->dtctl->dt_count > 0)
 		{
 			/* Remove tuples from heap */
-			lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
+			lazy_vacuum_page(onerel, blkno, buf, 0, lvstate, &vmbuffer);
 			has_dead_tuples = false;
 
 			/*
@@ -1146,7 +1308,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 			 * not to reset latestRemovedXid since we want that value to be
 			 * valid.
 			 */
-			vacrelstats->num_dead_tuples = 0;
+			lvstate->dtctl->dt_count = 0;
+
 			vacuumed_pages++;
 		}
 
@@ -1249,12 +1412,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		 * page, so remember its free space as-is.  (This path will always be
 		 * taken if there are no indexes.)
 		 */
-		if (vacrelstats->num_dead_tuples == prev_dead_count)
+		if (lvstate->dtctl->dt_count == prev_dead_count)
 			RecordPageWithFreeSpace(onerel, blkno, freespace);
 	}
 
 	/* report that everything is scanned and vacuumed */
-	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
+	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED,
+								 RelationGetNumberOfBlocks(onerel));
 
 	pfree(frozen);
 
@@ -1264,10 +1428,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	vacrelstats->new_dead_tuples = nkeep;
 
 	/* now we can compute the new value for pg_class.reltuples */
-	vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
-														 nblocks,
-														 vacrelstats->tupcount_pages,
-														 num_tuples);
+	if (!lvstate->parallel_mode)
+		vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+															 nblocks,
+															 vacrelstats->tupcount_pages,
+															 num_tuples);
 
 	/*
 	 * Release any remaining pin on visibility map page.
@@ -1280,7 +1445,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 	/* If any tuples need to be deleted, perform final vacuum cycle */
 	/* XXX put a threshold on min number of tuples here? */
-	if (vacrelstats->num_dead_tuples > 0)
+	if (lvstate->dtctl->dt_count > 0)
 	{
 		const int	hvp_index[] = {
 			PROGRESS_VACUUM_PHASE,
@@ -1288,6 +1453,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		};
 		int64		hvp_val[2];
 
+		/*
+		 * Here we're about to vacuum the table and indexes actually. Before
+		 * entering vacuum state, we have to wait for other vacuum worker to
+		 * reach here.
+		 */
+		lazy_prepare_vacuum(lvstate);
+
 		/* Log cleanup info before we touch indexes */
 		vacuum_log_cleanup_info(onerel, vacrelstats);
 
@@ -1297,9 +1469,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 
 		/* Remove index entries */
 		for (i = 0; i < nindexes; i++)
-			lazy_vacuum_index(Irel[i],
-							  &indstats[i],
-							  vacrelstats);
+		{
+			if (IsAssignedIndex(i, lvstate->pstate))
+				lazy_vacuum_index(Irel[i], &indstats[i], lvstate);
+		}
 
 		/* Report that we are now vacuuming the heap */
 		hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
@@ -1309,18 +1482,25 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 		/* Remove tuples from heap */
 		pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 									 PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
-		lazy_vacuum_heap(onerel, vacrelstats);
+
+		lazy_vacuum_heap(onerel, lvstate);
+
 		vacrelstats->num_index_scans++;
 	}
 
 	/* report all blocks vacuumed; and that we're cleaning up */
-	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
+	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED,
+								 RelationGetNumberOfBlocks(onerel));
 	pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
 								 PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
 
 	/* Do post-vacuum cleanup and statistics update for each index */
 	for (i = 0; i < nindexes; i++)
-		lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+	{
+		if (IsAssignedIndex(i, lvstate->pstate))
+			lazy_cleanup_index(Irel[i], indstats[i], lvstate->vacrelstats,
+							   lvstate->parallel_mode ? &(lvstate->indstats[i]) : NULL);
+	}
 
 	/* If no indexes, make log report that lazy_vacuum_heap would've made */
 	if (vacuumed_pages)
@@ -1329,6 +1509,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 						RelationGetRelationName(onerel),
 						tups_vacuumed, vacuumed_pages)));
 
+	lv_endscan(lvscan);
+
 	/*
 	 * This is pretty messy, but we split it up so that we can skip emitting
 	 * individual parts of the message when not applicable.
@@ -1362,6 +1544,35 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
 	pfree(buf.data);
 }
 
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+lazy_gather_vacuum_stats(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	int	i;
+	LVRelStats *lvstats_list;
+
+	lvstats_list = (LVRelStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_VACUUM_STATS, false);
+
+	/* Gather each worker stats */
+	for (i = 0; i < pcxt->nworkers_launched; i++)
+	{
+		LVRelStats *wstats = (LVRelStats*) ((char *) lvstats_list + sizeof(LVRelStats) * i);
+
+		vacrelstats->scanned_pages += wstats->scanned_pages;
+		vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+		vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+		vacrelstats->scanned_tuples += wstats->scanned_tuples;
+		vacrelstats->new_dead_tuples += wstats->new_dead_tuples;
+		vacrelstats->pages_removed += wstats->pages_removed;
+		vacrelstats->tuples_deleted += wstats->tuples_deleted;
+		vacrelstats->nonempty_pages += wstats->nonempty_pages;
+	}
+
+	/* all vacuum workers have same value of rel_pages */
+	vacrelstats->rel_pages = lvstats_list->rel_pages;
+}
 
 /*
  *	lazy_vacuum_heap() -- second pass over the heap
@@ -1375,18 +1586,27 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
  * process index entry removal in batches as large as possible.
  */
 static void
-lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
+lazy_vacuum_heap(Relation onerel, LVState *lvstate)
 {
 	int			tupindex;
 	int			npages;
 	PGRUsage	ru0;
+	BlockNumber	prev_tblk;
 	Buffer		vmbuffer = InvalidBuffer;
+	ItemPointer	deadtuples = lvstate->deadtuples;
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+	BlockNumber	ntuples = 0;
+	StringInfoData	buf;
 
 	pg_rusage_init(&ru0);
 	npages = 0;
 
 	tupindex = 0;
-	while (tupindex < vacrelstats->num_dead_tuples)
+
+	elog(WARNING, "min blk %u, max blk %u", ItemPointerGetBlockNumber(&deadtuples[0]),
+		 ItemPointerGetBlockNumber(&deadtuples[dtctl->dt_count - 1]));
+
+	while (tupindex < dtctl->dt_count)
 	{
 		BlockNumber tblk;
 		Buffer		buf;
@@ -1395,7 +1615,42 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 
 		vacuum_delay_point();
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		/*
+		 * If the dead tuple TIDs are shared among with all vacuum workers,
+		 * we acquire the lock and advance tupindex before vacuuming.
+		 *
+		 * NB: The number of maximum tuple can be stored into single
+		 * page is not a large number in most cases. We can use spinlock
+		 * here.
+		 */
+		if (IsDeadTupleShared(lvstate))
+		{
+			SpinLockAcquire(&(dtctl->mutex));
+
+			tupindex = dtctl->dt_index;
+
+			if (tupindex >= dtctl->dt_count)
+			{
+				SpinLockRelease(&(dtctl->mutex));
+				break;
+			}
+
+			/* Advance dtct->dt_index */
+			for (prev_tblk = tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
+				 dtctl->dt_index < dtctl->dt_count;
+				 dtctl->dt_index++)
+			{
+				tblk = ItemPointerGetBlockNumber(&deadtuples[dtctl->dt_index]);
+				if (prev_tblk != tblk)
+					break;
+
+				ntuples++;
+			}
+
+			SpinLockRelease(&(dtctl->mutex));
+		}
+
+		tblk = ItemPointerGetBlockNumber(&deadtuples[tupindex]);
 		buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
 								 vac_strategy);
 		if (!ConditionalLockBufferForCleanup(buf))
@@ -1404,7 +1659,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 			++tupindex;
 			continue;
 		}
-		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, vacrelstats,
+		tupindex = lazy_vacuum_page(onerel, tblk, buf, tupindex, lvstate,
 									&vmbuffer);
 
 		/* Now that we've compacted the page, record its available space */
@@ -1422,10 +1677,12 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
 		vmbuffer = InvalidBuffer;
 	}
 
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "\"%s\": removed %d row versions in %d pages",
+					 RelationGetRelationName(onerel), ntuples, npages);
 	ereport(elevel,
-			(errmsg("\"%s\": removed %d row versions in %d pages",
-					RelationGetRelationName(onerel),
-					tupindex, npages),
+			(errmsg("%s", buf.data),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1435,34 +1692,32 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
  *
  * Caller must hold pin and buffer cleanup lock on the buffer.
  *
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
- * tuple for this page.  We assume the rest follow sequentially.
- * The return value is the first tupindex after the tuples of this page.
  */
 static int
 lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
-				 int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer)
+				 int tupindex, LVState *lvstate, Buffer *vmbuffer)
 {
 	Page		page = BufferGetPage(buffer);
 	OffsetNumber unused[MaxOffsetNumber];
 	int			uncnt = 0;
 	TransactionId visibility_cutoff_xid;
 	bool		all_frozen;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
 
 	pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
 
 	START_CRIT_SECTION();
 
-	for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+	for (; tupindex < lvstate->dtctl->dt_count; tupindex++)
 	{
 		BlockNumber tblk;
 		OffsetNumber toff;
 		ItemId		itemid;
 
-		tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+		tblk = ItemPointerGetBlockNumber(&lvstate->deadtuples[tupindex]);
 		if (tblk != blkno)
 			break;				/* past end of tuples for this block */
-		toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+		toff = ItemPointerGetOffsetNumber(&lvstate->deadtuples[tupindex]);
 		itemid = PageGetItemId(page, toff);
 		ItemIdSetUnused(itemid);
 		unused[uncnt++] = toff;
@@ -1587,14 +1842,15 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
  *	lazy_vacuum_index() -- vacuum one index relation.
  *
  *		Delete all the index entries pointing to tuples listed in
- *		vacrelstats->dead_tuples, and update running statistics.
+ *		lvstate->deadtuples, and update running statistics.
  */
 static void
 lazy_vacuum_index(Relation indrel,
 				  IndexBulkDeleteResult **stats,
-				  LVRelStats *vacrelstats)
+				  LVState	*lvstate)
 {
 	IndexVacuumInfo ivinfo;
+	StringInfoData buf;
 	PGRUsage	ru0;
 
 	pg_rusage_init(&ru0);
@@ -1603,17 +1859,19 @@ lazy_vacuum_index(Relation indrel,
 	ivinfo.analyze_only = false;
 	ivinfo.estimated_count = true;
 	ivinfo.message_level = elevel;
-	ivinfo.num_heap_tuples = vacrelstats->old_rel_tuples;
+	ivinfo.num_heap_tuples = lvstate->vacrelstats->old_rel_tuples;
 	ivinfo.strategy = vac_strategy;
 
 	/* Do bulk deletion */
-	*stats = index_bulk_delete(&ivinfo, *stats,
-							   lazy_tid_reaped, (void *) vacrelstats);
+	*stats = index_bulk_delete(&ivinfo, *stats, lazy_tid_reaped, (void *) lvstate);
+
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "scanned index \"%s\" to remove %d row versions",
+					 RelationGetRelationName(indrel), lvstate->dtctl->dt_count);
 
 	ereport(elevel,
-			(errmsg("scanned index \"%s\" to remove %d row versions",
-					RelationGetRelationName(indrel),
-					vacrelstats->num_dead_tuples),
+			(errmsg("%s", buf.data),
 			 errdetail_internal("%s", pg_rusage_show(&ru0))));
 }
 
@@ -1621,11 +1879,11 @@ lazy_vacuum_index(Relation indrel,
  *	lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  */
 static void
-lazy_cleanup_index(Relation indrel,
-				   IndexBulkDeleteResult *stats,
-				   LVRelStats *vacrelstats)
+lazy_cleanup_index(Relation indrel, IndexBulkDeleteResult *stats,
+							   LVRelStats *vacrelstats, LVIndStats *indstat)
 {
 	IndexVacuumInfo ivinfo;
+	StringInfoData	buf;
 	PGRUsage	ru0;
 
 	pg_rusage_init(&ru0);
@@ -1639,34 +1897,52 @@ lazy_cleanup_index(Relation indrel,
 
 	stats = index_vacuum_cleanup(&ivinfo, stats);
 
+	/* Will be updated by leader process after vacuumed */
+	if (indstat)
+		indstat->updated = false;
+
 	if (!stats)
 		return;
 
 	/*
 	 * Now update statistics in pg_class, but only if the index says the count
-	 * is accurate.
+	 * is accurate. In parallel lazy vacuum, the worker can not update these
+	 * information by itself, so save to DSM and then the launcher process
+	 * updates it later.
 	 */
 	if (!stats->estimated_count)
-		vac_update_relstats(indrel,
-							stats->num_pages,
-							stats->num_index_tuples,
-							0,
-							false,
-							InvalidTransactionId,
-							InvalidMultiXactId,
-							false);
+	{
+		if (indstat)
+		{
+			indstat->updated = true;
+			indstat->num_pages = stats->num_pages;
+			indstat->num_tuples = stats->num_index_tuples;
+		}
+		else
+			vac_update_relstats(indrel,
+								stats->num_pages,
+								stats->num_index_tuples,
+								0,
+								false,
+								InvalidTransactionId,
+								InvalidMultiXactId,
+								false);
+	}
 
+	initStringInfo(&buf);
+	appendStringInfo(&buf,
+					 "index \"%s\" now contains %.0f row versions in %u pages",
+					 RelationGetRelationName(indrel),
+					 stats->num_index_tuples,
+					 stats->num_pages);
 	ereport(elevel,
-			(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
-					RelationGetRelationName(indrel),
-					stats->num_index_tuples,
-					stats->num_pages),
-			 errdetail("%.0f index row versions were removed.\n"
-					   "%u index pages have been deleted, %u are currently reusable.\n"
-					   "%s.",
-					   stats->tuples_removed,
-					   stats->pages_deleted, stats->pages_free,
-					   pg_rusage_show(&ru0))));
+			(errmsg("%s", buf.data),
+					errdetail("%.0f index row versions were removed.\n"
+							  "%u index pages have been deleted, %u are currently reusable.\n"
+							  "%s.",
+							  stats->tuples_removed,
+							  stats->pages_deleted, stats->pages_free,
+							  pg_rusage_show(&ru0))));
 
 	pfree(stats);
 }
@@ -1976,59 +2252,70 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
 /*
  * lazy_space_alloc - space allocation decisions for lazy vacuum
  *
+ * In parallel lazy vacuum the space for dead tuple locations are already
+ * allocated in dynamic shared memory, so we allocate space for dead tuple
+ * locations in local memory only when in not parallel lazy vacuum and set
+ * MyDeadTuple.
+ *
  * See the comments at the head of this file for rationale.
  */
 static void
-lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
+lazy_space_alloc(LVState *lvstate, BlockNumber relblocks)
 {
-	long		maxtuples;
-	int			vac_work_mem = IsAutoVacuumWorkerProcess() &&
-	autovacuum_work_mem != -1 ?
-	autovacuum_work_mem : maintenance_work_mem;
-
-	if (vacrelstats->hasindex)
-	{
-		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
-		maxtuples = Min(maxtuples, INT_MAX);
-		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+	long maxtuples;
 
-		/* curious coding here to ensure the multiplication can't overflow */
-		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
-			maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+	/*
+	 * In parallel mode, we already set the pointer to dead tuple
+	 * array when initialize.
+	 */
+	if (lvstate->parallel_mode && lvstate->vacrelstats->nindexes > 0)
+		return;
 
-		/* stay sane if small maintenance_work_mem */
-		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
-	}
-	else
-	{
-		maxtuples = MaxHeapTuplesPerPage;
-	}
+	maxtuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
 
-	vacrelstats->num_dead_tuples = 0;
-	vacrelstats->max_dead_tuples = (int) maxtuples;
-	vacrelstats->dead_tuples = (ItemPointer)
-		palloc(maxtuples * sizeof(ItemPointerData));
+	/*
+	 * If in not parallel lazy vacuum, we need to allocate dead
+	 * tuple array in local memory.
+	 */
+	lvstate->deadtuples = palloc0(sizeof(ItemPointerData) * (int)maxtuples);
+	lvstate->dtctl = (LVDeadTupleCtl *) palloc(sizeof(LVDeadTupleCtl));
+	lvstate->dtctl->dt_max = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+	lvstate->dtctl->dt_count = 0;
 }
 
 /*
  * lazy_record_dead_tuple - remember one deletable tuple
+ *
+ * Acquiring the spinlock before remember is required if the dead tuple
+ * TIDs are shared with other vacuum workers.
  */
 static void
-lazy_record_dead_tuple(LVRelStats *vacrelstats,
-					   ItemPointer itemptr)
+lazy_record_dead_tuple(LVState *lvstate, ItemPointer itemptr)
 {
+	LVDeadTupleCtl *dtctl = lvstate->dtctl;
+
+	if (IsDeadTupleShared(lvstate))
+		SpinLockAcquire(&(dtctl->mutex));
+
+	if (dtctl->dt_count >= dtctl->dt_max)
+		elog(ERROR, "dead tuple array overflow");
+
 	/*
 	 * The array must never overflow, since we rely on all deletable tuples
 	 * being removed; inability to remove a tuple might cause an old XID to
 	 * persist beyond the freeze limit, which could be disastrous later on.
 	 */
-	if (vacrelstats->num_dead_tuples >= vacrelstats->max_dead_tuples)
-		elog(ERROR, "dead tuple array overflow");
+	if (dtctl->dt_count < dtctl->dt_max)
+	{
+
+		lvstate->deadtuples[dtctl->dt_count] = *itemptr;
+		(dtctl->dt_count)++;
+		pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
+									 dtctl->dt_count);
+	}
 
-	vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
-	vacrelstats->num_dead_tuples++;
-	pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
-								 vacrelstats->num_dead_tuples);
+	if (IsDeadTupleShared(lvstate))
+		SpinLockRelease(&(dtctl->mutex));
 }
 
 /*
@@ -2041,16 +2328,23 @@ lazy_record_dead_tuple(LVRelStats *vacrelstats,
 static bool
 lazy_tid_reaped(ItemPointer itemptr, void *state)
 {
-	LVRelStats *vacrelstats = (LVRelStats *) state;
+	LVState *lvstate = (LVState *) state;
 	ItemPointer res;
 
+	/*
+	 * We can assume that the dead tuple TIDs are sorted by TID location
+	 * even when we shared the dead tuple TIDs with other vacuum workers.
+	 */
 	res = (ItemPointer) bsearch((void *) itemptr,
-								(void *) vacrelstats->dead_tuples,
-								vacrelstats->num_dead_tuples,
+								(void *) lvstate->deadtuples,
+								lvstate->dtctl->dt_count,
 								sizeof(ItemPointerData),
 								vac_cmp_itemptr);
 
-	return (res != NULL);
+	if (res != NULL)
+		return true;
+
+	return false;
 }
 
 /*
@@ -2194,3 +2488,627 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 
 	return all_visible;
 }
+
+/*
+ * Return the block number we need to scan next, or InvalidBlockNumber if scan
+ * is done.
+ *
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages.  Since we're reading
+ * sequentially, the OS should be doing readahead for us, so there's no
+ * gain in skipping a page now and then; that's likely to disable
+ * readahead and so be counterproductive. Also, skipping even a single
+ * page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * Before entering the main loop, establish the invariant that
+ * next_unskippable_block is the next block number >= blkno that we can't
+ * skip based on the visibility map, either all-visible for a regular scan
+ * or all-frozen for an aggressive scan.  We set it to nblocks if there's
+ * no such block.  We also set up the skipping_blocks flag correctly at
+ * this stage.
+ *
+ * In not parallel mode, before entering the main loop, establish the
+ * invariant that next_unskippable_block is the next block number >= blkno
+ * that's not we can't skip based on the visibility map, either all-visible
+ * for a regular scan or all-frozen for an aggressive scan.  We set it to
+ * nblocks if there's no such block.  We also set up the skipping_blocks
+ * flag correctly at this stage.
+ *
+ * In parallel mode, pstate is not NULL. We scan heap pages
+ * using parallel heap scan description. Each worker calls heap_parallelscan_nextpage()
+ * in order to exclusively get  block number we need to scan at next.
+ * If given block is all-visible according to visibility map, we skip to
+ * scan this block immediately unlike not parallel lazy scan.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer.  This is OK.  If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page.  It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them.  But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared.  Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * Computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid.  A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table.  This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page.  This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static BlockNumber
+lazy_scan_get_nextpage(Relation onerel, LVState *lvstate,
+					   LVScanDesc lvscan, bool *all_visible_according_to_vm,
+					   Buffer *vmbuffer, int options, bool aggressive)
+{
+	BlockNumber blkno;
+	LVRelStats	*vacrelstats = lvstate->vacrelstats;
+
+	if (lvstate->parallel_mode)
+	{
+		/*
+		 * In parallel lazy vacuum since it's hard to know how many consecutive
+		 * all-visible pages exits on table we skip to scan the heap page immediately.
+		 * if it is all-visible page.
+		 */
+		while ((blkno = heap_parallelscan_nextpage(lvscan->heapscan)) != InvalidBlockNumber)
+		{
+			*all_visible_according_to_vm = false;
+			vacuum_delay_point();
+
+			/* Consider to skip scan page according visibility map */
+			if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0 &&
+				!FORCE_CHECK_PAGE(blkno))
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, blkno, vmbuffer);
+
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+					{
+						vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+					else if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+						*all_visible_according_to_vm = true;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+					{
+						if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+							vacrelstats->frozenskipped_pages++;
+						continue;
+					}
+				}
+			}
+
+			/* We need to scan current blkno, break */
+			break;
+		}
+	}
+	else
+	{
+		bool skipping_blocks = false;
+
+		/* Initialize lv_nextunskippable_page if needed */
+		if (lvscan->lv_cblock == 0 && (options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+		{
+			while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+			{
+				uint8		vmstatus;
+
+				vmstatus = visibilitymap_get_status(onerel, lvscan->lv_next_unskippable_block,
+													vmbuffer);
+				if (aggressive)
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+						break;
+				}
+				else
+				{
+					if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+						break;
+				}
+				vacuum_delay_point();
+				lvscan->lv_next_unskippable_block++;
+			}
+
+			if (lvscan->lv_next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+				skipping_blocks = true;
+			else
+				skipping_blocks = false;
+		}
+
+		/* Decide the block number we need to scan */
+		for (blkno = lvscan->lv_cblock; blkno < lvscan->lv_nblocks; blkno++)
+		{
+			if (blkno == lvscan->lv_next_unskippable_block)
+			{
+				/* Time to advance next_unskippable_block */
+				lvscan->lv_next_unskippable_block++;
+				if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+				{
+					while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+					{
+						uint8		vmstatus;
+
+						vmstatus = visibilitymap_get_status(onerel,
+															lvscan->lv_next_unskippable_block,
+															vmbuffer);
+						if (aggressive)
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+								break;
+						}
+						else
+						{
+							if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+								break;
+						}
+						vacuum_delay_point();
+						lvscan->lv_next_unskippable_block++;
+					}
+				}
+
+				/*
+				 * We know we can't skip the current block.  But set up
+				 * skipping_all_visible_blocks to do the right thing at the
+				 * following blocks.
+				 */
+				if (lvscan->lv_next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
+					skipping_blocks = true;
+				else
+					skipping_blocks = false;
+
+				/*
+				 * Normally, the fact that we can't skip this block must mean that
+				 * it's not all-visible.  But in an aggressive vacuum we know only
+				 * that it's not all-frozen, so it might still be all-visible.
+				 */
+				if (aggressive && VM_ALL_VISIBLE(onerel, blkno, vmbuffer))
+					*all_visible_according_to_vm = true;
+
+				/* Found out that next unskippable block number */
+				break;
+			}
+			else
+			{
+				/*
+				 * The current block is potentially skippable; if we've seen a
+				 * long enough run of skippable blocks to justify skipping it, and
+				 * we're not forced to check it, then go ahead and skip.
+				 * Otherwise, the page must be at least all-visible if not
+				 * all-frozen, so we can set all_visible_according_to_vm = true.
+				 */
+				if (skipping_blocks && !FORCE_CHECK_PAGE(blkno))
+				{
+					/*
+					 * Tricky, tricky.  If this is in aggressive vacuum, the page
+					 * must have been all-frozen at the time we checked whether it
+					 * was skippable, but it might not be any more.  We must be
+					 * careful to count it as a skipped all-frozen page in that
+					 * case, or else we'll think we can't update relfrozenxid and
+					 * relminmxid.  If it's not an aggressive vacuum, we don't
+					 * know whether it was all-frozen, so we have to recheck; but
+					 * in this case an approximate answer is OK.
+					 */
+					if (aggressive || VM_ALL_FROZEN(onerel, blkno, vmbuffer))
+						vacrelstats->frozenskipped_pages++;
+					continue;
+				}
+
+				*all_visible_according_to_vm = true;
+
+				/* We need to scan current blkno, break */
+				break;
+			}
+		} /* for */
+
+		/* Advance the current block number for the next scan */
+		lvscan->lv_cblock = blkno + 1;
+	}
+
+	return (blkno == lvscan->lv_nblocks) ? InvalidBlockNumber : blkno;
+}
+
+/*
+ * Begin lazy vacuum scan. lvscan->heapscan is NULL if
+ * we're not in parallel lazy vacuum.
+ */
+static LVScanDesc
+lv_beginscan(Relation onerel, ParallelHeapScanDesc pscan)
+{
+	LVScanDesc lvscan;
+
+	lvscan = (LVScanDesc) palloc(sizeof(LVScanDescData));
+
+	lvscan->lv_cblock = 0;
+	lvscan->lv_next_unskippable_block = 0;
+	lvscan->lv_nblocks = RelationGetNumberOfBlocks(onerel);
+
+	if (pscan != NULL)
+	{
+		lvscan->heapscan = heap_beginscan_parallel(onerel, pscan);
+		heap_parallelscan_startblock_init(lvscan->heapscan);
+	}
+	else
+		lvscan->heapscan = NULL;
+
+	return lvscan;
+}
+
+/*
+ * End lazy vacuum scan.
+ */
+static void
+lv_endscan(LVScanDesc lvscan)
+{
+	if (lvscan->heapscan != NULL)
+		heap_endscan(lvscan->heapscan);
+	pfree(lvscan);
+}
+
+/* ----------------------------------------------------------------
+ *						Parallel Lazy Vacuum Support
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * Estimate storage for parallel lazy vacuum.
+ */
+static void
+lazy_estimate_dsm(ParallelContext *pcxt, LVRelStats *vacrelstats)
+{
+	Size size = 0;
+	int keys = 0;
+	int vacuum_workers = pcxt->nworkers + 1;
+	long maxtuples = lazy_get_max_dead_tuples(vacrelstats);
+
+	/* Estimate size for parallel heap scan */
+	size += heap_parallelscan_estimate(SnapshotAny);
+	keys++;
+
+	/* Estimate size for vacuum statistics for only workers*/
+	size += BUFFERALIGN(mul_size(sizeof(LVRelStats), pcxt->nworkers));
+	keys++;
+
+	/* We have to share dead tuple information only when the table has indexes */
+	if (vacrelstats->nindexes > 0)
+	{
+		/* Estimate size for index statistics */
+		size += BUFFERALIGN(mul_size(sizeof(LVIndStats), vacrelstats->nindexes));
+		keys++;
+
+		/* Estimate size for dead tuple control */
+		size += BUFFERALIGN(sizeof(LVDeadTupleCtl));
+		keys++;
+
+		/* Estimate size for dead tuple array */
+		size += BUFFERALIGN(mul_size(
+							 mul_size(sizeof(ItemPointerData), maxtuples),
+							 vacuum_workers));
+		keys++;
+	}
+
+	/* Estimate size for parallel lazy vacuum state */
+	size += BUFFERALIGN(sizeof(LVParallelState));
+	keys++;
+
+	/* Estimate size for vacuum task */
+	size += BUFFERALIGN(sizeof(VacuumInfo));
+	keys++;
+
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
+	shm_toc_estimate_keys(&pcxt->estimator, keys);
+}
+
+/*
+ * Initialize dynamic shared memory for parallel lazy vacuum. We store
+ * relevant informations of parallel heap scanning, dead tuple array
+ * and vacuum statistics for each worker and some parameters for lazy vacuum.
+ */
+static void
+lazy_initialize_dsm(ParallelContext *pcxt, Relation onerel, LVState *lvstate,
+					int options, bool aggressive)
+{
+	ParallelHeapScanDesc pscan_ptr;
+	ItemPointer	deadtuples_ptr;
+	char 		*lvrelstats_ptr;
+	LVParallelState *pstate_ptr;
+	LVIndStats	*indstats_ptr;
+	LVDeadTupleCtl	*dtctl_ptr;
+	int i;
+	int deadtuples_size;
+	int lvrelstats_size;
+	int	vacuum_workers = pcxt->nworkers + 1;
+	long max_tuples = lazy_get_max_dead_tuples(lvstate->vacrelstats);
+
+	/* Allocate and initialize DSM for vacuum stats for each worker */
+	lvrelstats_size = mul_size(sizeof(LVRelStats), pcxt->nworkers);
+	lvrelstats_ptr = shm_toc_allocate(pcxt->toc, lvrelstats_size);
+	for (i = 0; i < pcxt->nworkers; i++)
+	{
+		char *start;
+
+		start = lvrelstats_ptr + i * sizeof(LVRelStats);
+		memcpy(start, lvstate->vacrelstats, sizeof(LVRelStats));
+	}
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_STATS, lvrelstats_ptr);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Allocate and initialize DSM for dead tuple control */
+		dtctl_ptr = (LVDeadTupleCtl *) shm_toc_allocate(pcxt->toc, sizeof(LVDeadTupleCtl));
+		SpinLockInit(&(dtctl_ptr->mutex));
+		dtctl_ptr->dt_max = max_tuples * vacuum_workers;
+		dtctl_ptr->dt_count = 0;
+		dtctl_ptr->dt_index = 0;
+		lvstate->dtctl = dtctl_ptr;
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLE_CTL, dtctl_ptr);
+
+		/* Allocate and initialize DSM for dead tuple array */
+		deadtuples_size = mul_size(mul_size(sizeof(ItemPointerData), max_tuples),
+								   vacuum_workers);
+		deadtuples_ptr = (ItemPointer) shm_toc_allocate(pcxt->toc,
+														deadtuples_size);
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLES, deadtuples_ptr);
+		lvstate->deadtuples = deadtuples_ptr;
+
+		/* Allocate DSM for index statistics */
+		indstats_ptr = (LVIndStats *) shm_toc_allocate(pcxt->toc,
+													   mul_size(sizeof(LVIndStats),
+																lvstate->vacrelstats->nindexes));
+		shm_toc_insert(pcxt->toc, VACUUM_KEY_INDEX_STATS, indstats_ptr);
+		lvstate->indstats = indstats_ptr;
+	}
+
+	/* Allocate and initialize DSM for parallel scan description */
+	pscan_ptr = (ParallelHeapScanDesc) shm_toc_allocate(pcxt->toc,
+														heap_parallelscan_estimate(SnapshotAny));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_SCAN, pscan_ptr);
+	heap_parallelscan_initialize(pscan_ptr, onerel, SnapshotAny);
+	lvstate->pscan = pscan_ptr;
+
+	/* Allocate and initialize DSM for parallel vacuum state */
+	pstate_ptr = (LVParallelState *) shm_toc_allocate(pcxt->toc, sizeof(LVParallelState));
+	shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_STATE, pstate_ptr);
+
+	ConditionVariableInit(&(pstate_ptr->cv));
+	SpinLockInit(&(pstate_ptr->mutex));
+	pstate_ptr->nworkers = vacuum_workers;
+	pstate_ptr->state = VACSTATE_SCAN;
+	pstate_ptr->info.aggressive = aggressive;
+	pstate_ptr->info.options = options;
+	pstate_ptr->info.oldestxmin = OldestXmin;
+	pstate_ptr->info.freezelimit = FreezeLimit;
+	pstate_ptr->info.multixactcutoff = MultiXactCutoff;
+	pstate_ptr->info.elevel = elevel;
+	lvstate->pstate = pstate_ptr;
+}
+
+/*
+ * Initialize parallel lazy vacuum for worker.
+ */
+static LVState *
+lazy_initialize_worker(shm_toc *toc)
+{
+	LVState	*lvstate;
+	char *lvstats;
+
+	lvstate = (LVState *) palloc(sizeof(LVState));
+	lvstate->parallel_mode = true;
+
+	/* Set up vacuum stats */
+	lvstats = shm_toc_lookup(toc, VACUUM_KEY_VACUUM_STATS, false);
+	lvstate->vacrelstats = (LVRelStats *) (lvstats +
+										   sizeof(LVRelStats) * ParallelWorkerNumber);
+
+	if (lvstate->vacrelstats->nindexes > 0)
+	{
+		/* Set up dead tuple control */
+		lvstate->dtctl = (LVDeadTupleCtl *) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLE_CTL, false);
+
+		/* Set up dead tuple array */
+		lvstate->deadtuples = (ItemPointer) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLES, false);
+
+		/* Set up index statistics */
+		lvstate->indstats = (LVIndStats *) shm_toc_lookup(toc, VACUUM_KEY_INDEX_STATS, false);
+	}
+
+	/* Set up parallel vacuum state */
+	lvstate->pstate = (LVParallelState *) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_STATE, false);
+
+	/* Set up parallel heap scan description */
+	lvstate->pscan = (ParallelHeapScanDesc) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_SCAN, false);
+
+	/* Set up parameters for lazy vacuum */
+	OldestXmin = lvstate->pstate->info.oldestxmin;
+	FreezeLimit = lvstate->pstate->info.freezelimit;
+	MultiXactCutoff = lvstate->pstate->info.multixactcutoff;
+	elevel = lvstate->pstate->info.elevel;
+
+	return lvstate;
+}
+
+/*
+ * In the end of actual vacuumming on table and indexes actually, we have
+ * to wait for other all vacuum workers to reach here before clearing dead
+ * tuple TIDs information.
+ */
+static void
+lazy_end_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+	{
+		lvstate->dtctl->dt_count = 0;
+		return;
+	}
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		/* Fetch shared information */
+		if (!counted)
+			pstate->finish_count++;
+		finish_count = pstate->finish_count;
+		state = pstate->state;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_SCAN)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * If all launched parallel vacuum workers reached here, we can clear the
+		 * dead tuple TIDs information.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			/* Clear dead tuples */
+			lvstate->dtctl->dt_count = 0;
+
+			SpinLockAcquire(&pstate->mutex);
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_SCAN;
+			SpinLockRelease(&pstate->mutex);
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_DONE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Before starting actual vacuuming on table and indexes, we have to wait for
+ * other all vacuum workers so that all worker can see the same dead tuple TIDs
+ * information when vacuuming.
+ */
+static void
+lazy_prepare_vacuum(LVState *lvstate)
+{
+	LVParallelState *pstate = lvstate->pstate;
+	bool counted = false;
+
+	/* Exit if in not parallel vacuum */
+	if (!lvstate->parallel_mode)
+		return;
+
+	while (true)
+	{
+		int finish_count;
+		int state;
+
+		SpinLockAcquire(&(pstate->mutex));
+
+		if (!counted)
+			pstate->finish_count++;
+		state = pstate->state;
+		finish_count = pstate->finish_count;
+
+		SpinLockRelease(&(pstate->mutex));
+
+		if (state == VACSTATE_VACUUM)
+			break;
+
+		/*
+		 * Wake up other workers if counted up if first time to reach here and
+		 * is a parallel worker.
+		 */
+		if (!counted && IsParallelWorker())
+			ConditionVariableBroadcast(&pstate->cv);
+
+		counted = true;
+
+		/*
+		 * The leader process can change parallel vacuum state if all workers
+		 * have reached here.
+		 */
+		if (!IsParallelWorker() && finish_count == (lvstate->pcxt->nworkers_launched + 1))
+		{
+			qsort((void *) lvstate->deadtuples, lvstate->dtctl->dt_count,
+				  sizeof(ItemPointerData), vac_cmp_itemptr);
+
+			SpinLockAcquire(&pstate->mutex);
+			pstate->finish_count = 0;
+			pstate->state = VACSTATE_VACUUM;
+			SpinLockRelease(&pstate->mutex);
+
+			ConditionVariableBroadcast(&pstate->cv);
+			break;
+		}
+
+		ConditionVariableSleep(&pstate->cv, WAIT_EVENT_PARALLEL_VACUUM_PREPARE);
+	}
+
+	ConditionVariableCancelSleep();
+}
+
+/*
+ * Return the number of maximum dead tuples can be stored according
+ * to vac_work_mem.
+ */
+static long
+lazy_get_max_dead_tuples(LVRelStats *vacrelstats)
+{
+	long maxtuples;
+	int	vac_work_mem = IsAutoVacuumWorkerProcess() &&
+		autovacuum_work_mem != -1 ?
+		autovacuum_work_mem : maintenance_work_mem;
+
+	if (vacrelstats->nindexes != 0)
+	{
+		maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+		maxtuples = Min(maxtuples, INT_MAX);
+		maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+
+		/* curious coding here to ensure the multiplication can't overflow */
+		if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > vacrelstats->old_rel_pages)
+			maxtuples = vacrelstats->old_rel_pages * LAZY_ALLOC_TUPLES;
+
+		/* stay sane if small maintenance_work_mem */
+		maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+	}
+	else
+	{
+		maxtuples = MaxHeapTuplesPerPage;
+	}
+
+	return maxtuples;
+}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 7a70001..8cba9c8 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1662,7 +1662,12 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
 static bool
 _equalVacuumStmt(const VacuumStmt *a, const VacuumStmt *b)
 {
-	COMPARE_SCALAR_FIELD(options);
+	if (a->options.flags != b->options.flags)
+		return false;
+
+	if (a->options.nworkers != b->options.nworkers)
+		return false;
+
 	COMPARE_NODE_FIELD(rels);
 
 	return true;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 4c83a63..399640f 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -187,6 +187,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
 			   bool *deferrable, bool *initdeferred, bool *not_valid,
 			   bool *no_inherit, core_yyscan_t yyscanner);
 static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+static VacuumOptions *makeVacOpt(VacuumOption flag, int nworkers);
 
 %}
 
@@ -237,6 +238,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	struct ImportQual	*importqual;
 	InsertStmt			*istmt;
 	VariableSetStmt		*vsetstmt;
+	VacuumOptions		*vacopts;
 	PartitionElem		*partelem;
 	PartitionSpec		*partspec;
 	PartitionBoundSpec	*partboundspec;
@@ -305,7 +307,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 				create_extension_opt_item alter_extension_opt_item
 
 %type <ival>	opt_lock lock_type cast_context
-%type <ival>	vacuum_option_list vacuum_option_elem
+%type <vacopts>	vacuum_option_list vacuum_option_elem
 %type <boolean>	opt_or_replace
 				opt_grant_grant_option opt_grant_admin_option
 				opt_nowait opt_if_exists opt_with_data
@@ -10152,32 +10154,40 @@ cluster_index_specification:
 VacuumStmt: VACUUM opt_full opt_freeze opt_verbose opt_vacuum_relation_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 0);
 					if ($2)
-						n->options |= VACOPT_FULL;
+						vacopts->flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						vacopts->flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
 					n->rels = $5;
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 			| VACUUM opt_full opt_freeze opt_verbose AnalyzeStmt
 				{
 					VacuumStmt *n = (VacuumStmt *) $5;
-					n->options |= VACOPT_VACUUM;
+					n->options.flags |= VACOPT_VACUUM;
 					if ($2)
-						n->options |= VACOPT_FULL;
+						n->options.flags |= VACOPT_FULL;
 					if ($3)
-						n->options |= VACOPT_FREEZE;
+						n->options.flags |= VACOPT_FREEZE;
 					if ($4)
-						n->options |= VACOPT_VERBOSE;
+						n->options.flags |= VACOPT_VERBOSE;
+					n->options.nworkers = 0;
 					$$ = (Node *)n;
 				}
 			| VACUUM '(' vacuum_option_list ')' opt_vacuum_relation_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_VACUUM | $3;
+					VacuumOptions	*vacopts = $3;
+
+					n->options.flags = vacopts->flags | VACOPT_VACUUM;
+					n->options.nworkers = vacopts->nworkers;
 					n->rels = $5;
 					$$ = (Node *) n;
 				}
@@ -10185,18 +10195,38 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose opt_vacuum_relation_list
 
 vacuum_option_list:
 			vacuum_option_elem								{ $$ = $1; }
-			| vacuum_option_list ',' vacuum_option_elem		{ $$ = $1 | $3; }
+			| vacuum_option_list ',' vacuum_option_elem
+			{
+				VacuumOptions *vacopts1 = (VacuumOptions *)$1;
+				VacuumOptions *vacopts2 = (VacuumOptions *)$3;
+
+				vacopts1->flags |= vacopts2->flags;
+				if (vacopts2->flags == VACOPT_PARALLEL)
+					vacopts1->nworkers = vacopts2->nworkers;
+
+				$$ = vacopts1;
+				pfree(vacopts2);
+			}
 		;
 
 vacuum_option_elem:
-			analyze_keyword		{ $$ = VACOPT_ANALYZE; }
-			| VERBOSE			{ $$ = VACOPT_VERBOSE; }
-			| FREEZE			{ $$ = VACOPT_FREEZE; }
-			| FULL				{ $$ = VACOPT_FULL; }
+			analyze_keyword		{ $$ = makeVacOpt(VACOPT_ANALYZE, 0); }
+			| VERBOSE			{ $$ = makeVacOpt(VACOPT_VERBOSE, 0); }
+			| FREEZE			{ $$ = makeVacOpt(VACOPT_FREEZE, 0); }
+			| FULL				{ $$ = makeVacOpt(VACOPT_FULL, 0); }
+			| PARALLEL ICONST
+				{
+					if ($2 < 1)
+						ereport(ERROR,
+								(errcode(ERRCODE_SYNTAX_ERROR),
+								 errmsg("parallel vacuum degree must be more than 1"),
+								 parser_errposition(@1)));
+					$$ = makeVacOpt(VACOPT_PARALLEL, $2);
+				}
 			| IDENT
 				{
 					if (strcmp($1, "disable_page_skipping") == 0)
-						$$ = VACOPT_DISABLE_PAGE_SKIPPING;
+						$$ = makeVacOpt(VACOPT_DISABLE_PAGE_SKIPPING, 0);
 					else
 						ereport(ERROR,
 								(errcode(ERRCODE_SYNTAX_ERROR),
@@ -10208,11 +10238,16 @@ vacuum_option_elem:
 AnalyzeStmt: analyze_keyword opt_verbose opt_vacuum_relation_list
 				{
 					VacuumStmt *n = makeNode(VacuumStmt);
-					n->options = VACOPT_ANALYZE;
+					VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 0);
+
 					if ($2)
-						n->options |= VACOPT_VERBOSE;
+						vacopts->flags |= VACOPT_VERBOSE;
+
+					n->options.flags = vacopts->flags;
+					n->options.nworkers = 0;
 					n->rels = $3;
 					$$ = (Node *)n;
+					pfree(vacopts);
 				}
 		;
 
@@ -15915,6 +15950,16 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
 	return (Node *) s;
 }
 
+static VacuumOptions *
+makeVacOpt(VacuumOption flag, int nworkers)
+{
+	VacuumOptions *vacopt = palloc(sizeof(VacuumOptions));
+
+	vacopt->flags = flag;
+	vacopt->nworkers = nworkers;
+	return vacopt;
+}
+
 /* parser_init()
  * Initialize to parse one query string
  */
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index c04c0b5..6d89052 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -189,7 +189,7 @@ typedef struct av_relation
 typedef struct autovac_table
 {
 	Oid			at_relid;
-	int			at_vacoptions;	/* bitmask of VacuumOption */
+	VacuumOptions at_vacoptions;	/* contains bitmask of VacuumOption */
 	VacuumParams at_params;
 	int			at_vacuum_cost_delay;
 	int			at_vacuum_cost_limit;
@@ -2466,7 +2466,7 @@ do_autovacuum(void)
 			 * next table in our list.
 			 */
 			HOLD_INTERRUPTS();
-			if (tab->at_vacoptions & VACOPT_VACUUM)
+			if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 				errcontext("automatic vacuum of table \"%s.%s.%s\"",
 						   tab->at_datname, tab->at_nspname, tab->at_relname);
 			else
@@ -2818,6 +2818,7 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 		int			vac_cost_limit;
 		int			vac_cost_delay;
 		int			log_min_duration;
+		int			vacuum_parallel_workers;
 
 		/*
 		 * Calculate the vacuum cost parameters and the freeze ages.  If there
@@ -2864,13 +2865,19 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
 			? avopts->multixact_freeze_table_age
 			: default_multixact_freeze_table_age;
 
+		vacuum_parallel_workers = (avopts &&
+								   avopts->vacuum_parallel_workers >= 0)
+			? avopts->vacuum_parallel_workers
+			: 0;
+
 		tab = palloc(sizeof(autovac_table));
 		tab->at_relid = relid;
 		tab->at_sharedrel = classForm->relisshared;
-		tab->at_vacoptions = VACOPT_SKIPTOAST |
+		tab->at_vacoptions.flags = VACOPT_SKIPTOAST |
 			(dovacuum ? VACOPT_VACUUM : 0) |
 			(doanalyze ? VACOPT_ANALYZE : 0) |
 			(!wraparound ? VACOPT_NOWAIT : 0);
+		tab->at_vacoptions.nworkers = vacuum_parallel_workers;
 		tab->at_params.freeze_min_age = freeze_min_age;
 		tab->at_params.freeze_table_age = freeze_table_age;
 		tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
@@ -3116,10 +3123,10 @@ autovac_report_activity(autovac_table *tab)
 	int			len;
 
 	/* Report the command and possible options */
-	if (tab->at_vacoptions & VACOPT_VACUUM)
+	if (tab->at_vacoptions.flags & VACOPT_VACUUM)
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: VACUUM%s",
-				 tab->at_vacoptions & VACOPT_ANALYZE ? " ANALYZE" : "");
+				 tab->at_vacoptions.flags & VACOPT_ANALYZE ? " ANALYZE" : "");
 	else
 		snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
 				 "autovacuum: ANALYZE");
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 3a0b49c..34d8a5c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3210,6 +3210,25 @@ pgstat_report_xact_timestamp(TimestampTz tstamp)
 	pgstat_increment_changecount_after(beentry);
 }
 
+/*-----------
+ * pgstat_report_leader_pid() -
+ *
+ * Report process id of the leader process that this backend is involved
+ * with.
+ */
+void
+pgstat_report_leader_pid(int pid)
+{
+	volatile PgBackendStatus *beentry = MyBEEntry;
+
+	if (!beentry)
+		return;
+
+	pgstat_increment_changecount_before(beentry);
+	beentry->st_leader_pid = pid;
+	pgstat_increment_changecount_after(beentry);
+}
+
 /* ----------
  * pgstat_read_current_status() -
  *
@@ -3610,6 +3629,12 @@ pgstat_get_wait_ipc(WaitEventIPC w)
 		case WAIT_EVENT_PARALLEL_BITMAP_SCAN:
 			event_name = "ParallelBitmapScan";
 			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_PREPARE:
+			event_name = "ParallelVacuumPrepare";
+			break;
+		case WAIT_EVENT_PARALLEL_VACUUM_DONE:
+			event_name = "ParallelVacuumDone";
+			break;
 		case WAIT_EVENT_PROCARRAY_GROUP_UPDATE:
 			event_name = "ProcArrayGroupUpdate";
 			break;
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 82a707a..ead81ff 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -669,7 +669,7 @@ standard_ProcessUtility(PlannedStmt *pstmt,
 				VacuumStmt *stmt = (VacuumStmt *) parsetree;
 
 				/* we choose to allow this during "read only" transactions */
-				PreventCommandDuringRecovery((stmt->options & VACOPT_VACUUM) ?
+				PreventCommandDuringRecovery((stmt->options.flags & VACOPT_VACUUM) ?
 											 "VACUUM" : "ANALYZE");
 				/* forbidden in parallel mode due to CommandIsReadOnly */
 				ExecVacuum(stmt, isTopLevel);
@@ -2498,7 +2498,7 @@ CreateCommandTag(Node *parsetree)
 			break;
 
 		case T_VacuumStmt:
-			if (((VacuumStmt *) parsetree)->options & VACOPT_VACUUM)
+			if (((VacuumStmt *) parsetree)->options.flags & VACOPT_VACUUM)
 				tag = "VACUUM";
 			else
 				tag = "ANALYZE";
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index 8d9e7c1..463f4d2 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -439,7 +439,7 @@ pg_stat_get_backend_idset(PG_FUNCTION_ARGS)
 Datum
 pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 {
-#define PG_STAT_GET_PROGRESS_COLS	PGSTAT_NUM_PROGRESS_PARAM + 3
+#define PG_STAT_GET_PROGRESS_COLS	PGSTAT_NUM_PROGRESS_PARAM + 4
 	int			num_backends = pgstat_fetch_stat_numbackends();
 	int			curr_backend;
 	char	   *cmd = text_to_cstring(PG_GETARG_TEXT_PP(0));
@@ -516,14 +516,16 @@ pg_stat_get_progress_info(PG_FUNCTION_ARGS)
 		if (has_privs_of_role(GetUserId(), beentry->st_userid))
 		{
 			values[2] = ObjectIdGetDatum(beentry->st_progress_command_target);
+			values[3] = Int32GetDatum(beentry->st_leader_pid);
 			for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
-				values[i + 3] = Int64GetDatum(beentry->st_progress_param[i]);
+				values[i + 4] = Int64GetDatum(beentry->st_progress_param[i]);
 		}
 		else
 		{
 			nulls[2] = true;
+			nulls[3] = true;
 			for (i = 0; i < PGSTAT_NUM_PROGRESS_PARAM; i++)
-				nulls[i + 3] = true;
+				nulls[i + 4] = true;
 		}
 
 		tuplestore_putvalues(tupstore, tupdesc, values, nulls);
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 294ab70..159f2e4 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -2040,7 +2040,6 @@ EstimateSnapshotSpace(Snapshot snap)
 	Size		size;
 
 	Assert(snap != InvalidSnapshot);
-	Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
 
 	/* We allocate any XID arrays needed in the same palloc block. */
 	size = add_size(sizeof(SerializedSnapshotData),
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a09c49d..4b44afe 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2008,6 +2008,7 @@ psql_completion(const char *text, int start, int end)
 			"autovacuum_multixact_freeze_max_age",
 			"autovacuum_multixact_freeze_min_age",
 			"autovacuum_multixact_freeze_table_age",
+			"autovacuum_vacuum_parallel_workers",
 			"autovacuum_vacuum_cost_delay",
 			"autovacuum_vacuum_cost_limit",
 			"autovacuum_vacuum_scale_factor",
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index 9f4367d..cbda797 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -132,6 +132,8 @@ extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
 							 Relation relation, Snapshot snapshot);
 extern void heap_parallelscan_reinitialize(ParallelHeapScanDesc parallel_scan);
 extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern void heap_parallelscan_startblock_init(HeapScanDesc scan);
+extern BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
 
 extern bool heap_fetch(Relation relation, Snapshot snapshot,
 		   HeapTuple tuple, Buffer *userbuf, bool keep_buf,
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index 93c031a..5353beb 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -2879,7 +2879,7 @@ DATA(insert OID = 1936 (  pg_stat_get_backend_idset		PGNSP PGUID 12 1 100 0 0 f
 DESCR("statistics: currently active backend IDs");
 DATA(insert OID = 2022 (  pg_stat_get_activity			PGNSP PGUID 12 1 100 0 0 f f f f f t s r 1 0 2249 "23" "{23,26,23,26,25,25,25,25,25,1184,1184,1184,1184,869,25,23,28,28,25,16,25,25,23,16,25}" "{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}" "{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,sslcompression,sslclientdn}" _null_ _null_ pg_stat_get_activity _null_ _null_ _null_ ));
 DESCR("statistics: information about currently active backends");
-DATA(insert OID = 3318 (  pg_stat_get_progress_info			  PGNSP PGUID 12 1 100 0 0 f f f f t t s r 1 0 2249 "25" "{25,23,26,26,20,20,20,20,20,20,20,20,20,20}" "{i,o,o,o,o,o,o,o,o,o,o,o,o,o}" "{cmdtype,pid,datid,relid,param1,param2,param3,param4,param5,param6,param7,param8,param9,param10}" _null_ _null_ pg_stat_get_progress_info _null_ _null_ _null_ ));
+DATA(insert OID = 3318 (  pg_stat_get_progress_info			  PGNSP PGUID 12 1 100 0 0 f f f f t t s r 1 0 2249 "25" "{25,23,26,26,23,20,20,20,20,20,20,20,20,20,20}" "{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o}" "{cmdtype,pid,datid,relid,leader_pid,param1,param2,param3,param4,param5,param6,param7,param8,param9,param10}" _null_ _null_ pg_stat_get_progress_info _null_ _null_ _null_ ));
 DESCR("statistics: information about progress of backends running maintenance command");
 DATA(insert OID = 3099 (  pg_stat_get_wal_senders	PGNSP PGUID 12 1 10 0 0 f f f f f t s r 0 0 2249 "" "{23,25,3220,3220,3220,3220,1186,1186,1186,23,25}" "{o,o,o,o,o,o,o,o,o,o,o}" "{pid,state,sent_lsn,write_lsn,flush_lsn,replay_lsn,write_lag,flush_lag,replay_lag,sync_priority,sync_state}" _null_ _null_ pg_stat_get_wal_senders _null_ _null_ _null_ ));
 DESCR("statistics: information about currently active replication");
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 7a7b793..e7950d5 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,8 @@
 #define VACUUM_H
 
 #include "access/htup.h"
+#include "access/heapam.h"
+#include "access/parallel.h"
 #include "catalog/pg_statistic.h"
 #include "catalog/pg_type.h"
 #include "nodes/parsenodes.h"
@@ -157,7 +159,7 @@ extern int	vacuum_multixact_freeze_table_age;
 
 /* in commands/vacuum.c */
 extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
-extern void vacuum(int options, List *relations, VacuumParams *params,
+extern void vacuum(VacuumOptions options, List *relations, VacuumParams *params,
 	   BufferAccessStrategy bstrategy, bool isTopLevel);
 extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
 				 int *nindexes, Relation **Irel);
@@ -187,8 +189,9 @@ extern void vac_update_datfrozenxid(void);
 extern void vacuum_delay_point(void);
 
 /* in commands/vacuumlazy.c */
-extern void lazy_vacuum_rel(Relation onerel, int options,
+extern void lazy_vacuum_rel(Relation onerel, VacuumOptions options,
 				VacuumParams *params, BufferAccessStrategy bstrategy);
+extern void LazyVacuumWorkerMain(dsm_segment *seg, shm_toc *toc);
 
 /* in commands/analyze.c */
 extern void analyze_rel(Oid relid, RangeVar *relation, int options,
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 732e5d6..a32917b 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -3095,9 +3095,16 @@ typedef enum VacuumOption
 	VACOPT_FULL = 1 << 4,		/* FULL (non-concurrent) vacuum */
 	VACOPT_NOWAIT = 1 << 5,		/* don't wait to get lock (autovacuum only) */
 	VACOPT_SKIPTOAST = 1 << 6,	/* don't process the TOAST table, if any */
-	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7	/* don't skip any pages */
+	VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7,	/* don't skip any pages */
+	VACOPT_PARALLEL = 1 << 8	/* do VACUUM parallelly */
 } VacuumOption;
 
+typedef struct VacuumOptions
+{
+	VacuumOption flags; /* OR of VacuumOption flags */
+	int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
+
 /*
  * Info about a single target table of VACUUM/ANALYZE.
  *
@@ -3116,7 +3123,7 @@ typedef struct VacuumRelation
 typedef struct VacuumStmt
 {
 	NodeTag		type;
-	int			options;		/* OR of VacuumOption flags */
+	VacuumOptions	options;
 	List	   *rels;			/* list of VacuumRelation, or NIL for all */
 } VacuumStmt;
 
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 089b7c3..5860b4a 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -811,6 +811,8 @@ typedef enum
 	WAIT_EVENT_MQ_SEND,
 	WAIT_EVENT_PARALLEL_FINISH,
 	WAIT_EVENT_PARALLEL_BITMAP_SCAN,
+	WAIT_EVENT_PARALLEL_VACUUM_PREPARE,
+	WAIT_EVENT_PARALLEL_VACUUM_DONE,
 	WAIT_EVENT_PROCARRAY_GROUP_UPDATE,
 	WAIT_EVENT_CLOG_GROUP_UPDATE,
 	WAIT_EVENT_REPLICATION_ORIGIN_DROP,
@@ -1014,13 +1016,17 @@ typedef struct PgBackendStatus
 
 	/*
 	 * Command progress reporting.  Any command which wishes can advertise
-	 * that it is running by setting st_progress_command,
+	 * that it is running by setting st_leaderpid, st_progress_command,
 	 * st_progress_command_target, and st_progress_param[].
 	 * st_progress_command_target should be the OID of the relation which the
 	 * command targets (we assume there's just one, as this is meant for
 	 * utility commands), but the meaning of each element in the
 	 * st_progress_param array is command-specific.
+	 * st_leader_pid can be used for command progress reporting of parallel
+	 * operation. Setting by the leader's pid of parallel operation we can
+	 * group them in progress reporting SQL.
 	 */
+	int			st_leader_pid;
 	ProgressCommandType st_progress_command;
 	Oid			st_progress_command_target;
 	int64		st_progress_param[PGSTAT_NUM_PROGRESS_PARAM];
@@ -1180,6 +1186,7 @@ extern void pgstat_report_activity(BackendState state, const char *cmd_str);
 extern void pgstat_report_tempfile(size_t filesize);
 extern void pgstat_report_appname(const char *appname);
 extern void pgstat_report_xact_timestamp(TimestampTz tstamp);
+extern void pgstat_report_leader_pid(int pid);
 extern const char *pgstat_get_wait_event(uint32 wait_event_info);
 extern const char *pgstat_get_wait_event_type(uint32 wait_event_info);
 extern const char *pgstat_get_backend_current_activity(int pid, bool checkUser);
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index 4bc61e5..2f44d39 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -268,6 +268,7 @@ typedef struct AutoVacOpts
 	int			multixact_freeze_min_age;
 	int			multixact_freeze_max_age;
 	int			multixact_freeze_table_age;
+	int			vacuum_parallel_workers;
 	int			log_min_duration;
 	float8		vacuum_scale_factor;
 	float8		analyze_scale_factor;
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index c440c7e..b43ef0a 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -80,6 +80,9 @@ CONTEXT:  SQL function "do_analyze" statement 1
 SQL function "wrap_do_analyze" statement 1
 VACUUM FULL vactst;
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
+VACUUM (PARALLEL 2, DISABLE_PAGE_SKIPPING) vactst;
+VACUUM (PARALLEL 2, FREEZE) vactst;
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);
 CREATE TABLE vacparted1 PARTITION OF vacparted FOR VALUES IN (1);
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 92eaca2..ab9bc4c 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -61,6 +61,9 @@ VACUUM FULL vaccluster;
 VACUUM FULL vactst;
 
 VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
+VACUUM (PARALLEL 2, DISABLE_PAGE_SKIPPING) vactst;
+VACUUM (PARALLEL 2, FREEZE) vactst;
 
 -- partitioned table
 CREATE TABLE vacparted (a int, b char) PARTITION BY LIST (a);

#46

Robert Haas

robertmhaas@gmail.com

about 8 years ago

In reply to: Masahiko Sawada (#43)

Re: Block level parallel vacuum WIP

On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Did you ever find out what the cause of this problem was?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#47

Thomas Munro

thomas.munro@enterprisedb.com

about 8 years ago

In reply to: Robert Haas (#46)

Re: Block level parallel vacuum WIP

On Sun, Oct 22, 2017 at 5:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Did you ever find out what the cause of this problem was?

I wonder if it might have been the same issue that commit
19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later.

--
Thomas Munro
http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#48

Amit Langote

Langote_Amit_f8@lab.ntt.co.jp

about 8 years ago

In reply to: Thomas Munro (#47)

Re: Block level parallel vacuum WIP

On 2017/10/22 5:25, Thomas Munro wrote:

On Sun, Oct 22, 2017 at 5:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Did you ever find out what the cause of this problem was?

I wonder if it might have been the same issue that commit
19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later.

Hmm, 19de0ab23ccba seems to prevent the "cache lookup failed for relation
XXX" error in a different code path though (the code path handling manual
vacuum). Not sure if the commit could have prevented that error being
emitted by DROP SCHEMA ... CASCADE, which seems to be what produced it in
this case. Maybe I'm missing something though.

Thanks,
Amit

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#49

Masahiko Sawada

sawada.mshk@gmail.com

about 8 years ago

In reply to: Amit Langote (#48)

Re: Block level parallel vacuum WIP

On Mon, Oct 23, 2017 at 10:43 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:

On 2017/10/22 5:25, Thomas Munro wrote:

On Sun, Oct 22, 2017 at 5:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

Thank you for letting me know.

Hmm, it's an interesting failure. I'll investigate it and post the new patch.

Did you ever find out what the cause of this problem was?

I wonder if it might have been the same issue that commit
19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later.

Hmm, 19de0ab23ccba seems to prevent the "cache lookup failed for relation
XXX" error in a different code path though (the code path handling manual
vacuum). Not sure if the commit could have prevented that error being
emitted by DROP SCHEMA ... CASCADE, which seems to be what produced it in
this case. Maybe I'm missing something though.

Yeah, I was thinking the commit is relevant with this issue but as
Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
I don't find out the cause of this issue yet. With the previous
version patch, autovacuum workers were woking with one parallel worker
but it never drops relations. So it's possible that the error might
not have been relevant with the patch but anywayI'll continue to work
on that.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

#50

Michael Paquier

michael.paquier@gmail.com

about 8 years ago

In reply to: Masahiko Sawada (#49)

Re: [HACKERS] Block level parallel vacuum WIP

On Tue, Oct 24, 2017 at 5:54 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:

Yeah, I was thinking the commit is relevant with this issue but as
Amit mentioned this error is emitted by DROP SCHEMA CASCASE.
I don't find out the cause of this issue yet. With the previous
version patch, autovacuum workers were woking with one parallel worker
but it never drops relations. So it's possible that the error might
not have been relevant with the patch but anywayI'll continue to work
on that.

This depends on the extension lock patch from
/messages/by-id/CAD21AoCmT3cFQUN4aVvzy5chw7DuzXrJCbrjTU05B+Ss=Gn1LA@mail.gmail.com/
if I am following correctly. So I propose to mark this patch as
returned with feedback for now, and come back to it once the root
problems are addressed. Feel free to correct me if you think that's
not adapted.
--
Michael