Block level parallel vacuum WIP
Hi all,
I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.
Vacuum Processing Logic
===================
PostgreSQL VACUUM processing logic consists of 2 phases,
1. Collecting dead tuple locations on heap.
2. Reclaiming dead tuples from heap and indexes.
These phases 1 and 2 are executed alternately, and once amount of dead
tuple location reached maintenance_work_mem in phase 1, phase 2 will
be executed.
Basic Design
==========
As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.
I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.
Performance(PoC)
=========
I ran parallel vacuum on 13GB table (pgbench scale 1000) with several
workers (on my poor virtual machine).
The result is,
1. Vacuum whole table without index (disable page skipping)
1 worker : 33 sec
2 workers : 27 sec
3 workers : 23 sec
4 workers : 22 sec
2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 sec
As a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).
Another Design
============
ISTM that processing index vacuum by multiple process is not good idea
in most cases because many index items can be stored in a page and
multiple vacuum worker could try to require the cleanup lock on the
same index buffer.
It's rather better that multiple workers process particular block
range and then multiple workers process each particular block range,
and then one worker per index processes index vacuum.
Still lots of work to do but attached PoC patch.
Feedback and suggestion are very welcome.
Regards,
--
Masahiko Sawada
Attachments:
0001-Allow-muliple-backends-to-wait-for-cleanup-lock.patchtext/plain; charset=US-ASCII; name=0001-Allow-muliple-backends-to-wait-for-cleanup-lock.patchDownload
From b25d491a05a43fb7adf014b2580c71ec7adb75a2 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 8 Aug 2016 16:43:35 -0700
Subject: [PATCH 1/2] Allow muliple backends to wait for cleanup lock.
---
src/backend/storage/buffer/buf_init.c | 3 +-
src/backend/storage/buffer/bufmgr.c | 57 +++++++++++++++++++++++------------
src/include/storage/buf_internals.h | 4 ++-
src/include/storage/proc.h | 2 ++
4 files changed, 45 insertions(+), 21 deletions(-)
diff --git a/src/backend/storage/buffer/buf_init.c b/src/backend/storage/buffer/buf_init.c
index a4163cf..2aad030 100644
--- a/src/backend/storage/buffer/buf_init.c
+++ b/src/backend/storage/buffer/buf_init.c
@@ -134,7 +134,8 @@ InitBufferPool(void)
CLEAR_BUFFERTAG(buf->tag);
pg_atomic_init_u32(&buf->state, 0);
- buf->wait_backend_pid = 0;
+ dlist_init(&buf->pin_count_waiters);
+ pg_atomic_write_u32(&buf->nwaiters, 0);
buf->buf_id = i;
diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
index 76ade37..f2f4ab9 100644
--- a/src/backend/storage/buffer/bufmgr.c
+++ b/src/backend/storage/buffer/bufmgr.c
@@ -38,6 +38,7 @@
#include "catalog/storage.h"
#include "executor/instrument.h"
#include "lib/binaryheap.h"
+#include "lib/ilist.h"
#include "miscadmin.h"
#include "pg_trace.h"
#include "pgstat.h"
@@ -1730,15 +1731,19 @@ UnpinBuffer(BufferDesc *buf, bool fixOwner)
*/
buf_state = LockBufHdr(buf);
- if ((buf_state & BM_PIN_COUNT_WAITER) &&
- BUF_STATE_GET_REFCOUNT(buf_state) == 1)
+ if (buf_state & BM_PIN_COUNT_WAITER)
{
- /* we just released the last pin other than the waiter's */
- int wait_backend_pid = buf->wait_backend_pid;
+ dlist_mutable_iter iter;
- buf_state &= ~BM_PIN_COUNT_WAITER;
+ if (pg_atomic_read_u32(&buf->nwaiters) == 1)
+ buf_state &= ~BM_PIN_COUNT_WAITER;
+
+ dlist_foreach_modify(iter, &buf->pin_count_waiters)
+ {
+ PGPROC *waiter = dlist_container(PGPROC, clWaitLink, iter.cur);
+ ProcSendSignal(waiter->pid);
+ }
UnlockBufHdr(buf, buf_state);
- ProcSendSignal(wait_backend_pid);
}
else
UnlockBufHdr(buf, buf_state);
@@ -3513,8 +3518,17 @@ UnlockBuffers(void)
* got a cancel/die interrupt before getting the signal.
*/
if ((buf_state & BM_PIN_COUNT_WAITER) != 0 &&
- buf->wait_backend_pid == MyProcPid)
- buf_state &= ~BM_PIN_COUNT_WAITER;
+ pg_atomic_read_u32(&buf->nwaiters) == 1)
+ {
+ dlist_mutable_iter iter;
+
+ dlist_foreach_modify(iter, &buf->pin_count_waiters)
+ {
+ PGPROC *waiter = dlist_container(PGPROC, clWaitLink, iter.cur);
+ if (waiter->pid == MyProcPid)
+ buf_state &= ~BM_PIN_COUNT_WAITER;
+ }
+ }
UnlockBufHdr(buf, buf_state);
@@ -3616,20 +3630,24 @@ LockBufferForCleanup(Buffer buffer)
buf_state = LockBufHdr(bufHdr);
Assert(BUF_STATE_GET_REFCOUNT(buf_state) > 0);
- if (BUF_STATE_GET_REFCOUNT(buf_state) == 1)
+ /*
+ * If refcount == 1 then we can break immediately.
+ * In case of refcount > 1, if refcount == (nwaiters + 1) then break.
+ * Because refcount include other processes and itself, but nwaiters
+ * includes only other processes.
+ */
+ if (BUF_STATE_GET_REFCOUNT(buf_state) == 1 ||
+ ((BUF_STATE_GET_REFCOUNT(buf_state) - 1)==
+ pg_atomic_read_u32(&bufHdr->nwaiters)))
{
/* Successfully acquired exclusive lock with pincount 1 */
UnlockBufHdr(bufHdr, buf_state);
return;
}
/* Failed, so mark myself as waiting for pincount 1 */
- if (buf_state & BM_PIN_COUNT_WAITER)
- {
- UnlockBufHdr(bufHdr, buf_state);
- LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
- elog(ERROR, "multiple backends attempting to wait for pincount 1");
- }
- bufHdr->wait_backend_pid = MyProcPid;
+ pg_atomic_fetch_add_u32(&bufHdr->nwaiters, 1);
+ dlist_push_tail(&bufHdr->pin_count_waiters, &MyProc->clWaitLink);
+
PinCountWaitBuf = bufHdr;
buf_state |= BM_PIN_COUNT_WAITER;
UnlockBufHdr(bufHdr, buf_state);
@@ -3662,9 +3680,10 @@ LockBufferForCleanup(Buffer buffer)
* better be safe.
*/
buf_state = LockBufHdr(bufHdr);
- if ((buf_state & BM_PIN_COUNT_WAITER) != 0 &&
- bufHdr->wait_backend_pid == MyProcPid)
- buf_state &= ~BM_PIN_COUNT_WAITER;
+
+ dlist_delete(&MyProc->clWaitLink);
+ pg_atomic_fetch_sub_u32(&bufHdr->nwaiters, 1);
+
UnlockBufHdr(bufHdr, buf_state);
PinCountWaitBuf = NULL;
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index e0dfb2f..90fcbd7 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -182,7 +182,9 @@ typedef struct BufferDesc
/* state of the tag, containing flags, refcount and usagecount */
pg_atomic_uint32 state;
- int wait_backend_pid; /* backend PID of pin-count waiter */
+ dlist_head pin_count_waiters; /* backend PIDs of pin-count waiters */
+ pg_atomic_uint32 nwaiters;
+
int freeNext; /* link in freelist chain */
LWLock content_lock; /* to lock access to buffer contents */
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index f576f05..4cd9416 100644
--- a/src/include/storage/proc.h
+++ b/src/include/storage/proc.h
@@ -123,6 +123,8 @@ struct PGPROC
LOCKMASK heldLocks; /* bitmask for lock types already held on this
* lock object by this backend */
+ dlist_node clWaitLink; /* position in Cleanup Lock wait list */
+
/*
* Info to allow us to wait for synchronous replication, if needed.
* waitLSN is InvalidXLogRecPtr if not waiting; set only by user backend.
--
2.8.1
0002-Block-level-parallel-Vacuum.patchtext/plain; charset=US-ASCII; name=0002-Block-level-parallel-Vacuum.patchDownload
From 1955cb9f68f9c027c566c4909d7ead475cf20f3b Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Mon, 8 Aug 2016 16:43:55 -0700
Subject: [PATCH 2/2] Block level parallel Vacuum.
---
src/backend/access/nbtree/nbtutils.c | 19 ---
src/backend/commands/vacuum.c | 1 +
src/backend/commands/vacuumlazy.c | 239 ++++++++++++++++++++++++++++-------
src/backend/utils/misc/guc.c | 10 ++
src/include/commands/vacuum.h | 41 +++++-
src/include/storage/buf_internals.h | 1 +
6 files changed, 245 insertions(+), 66 deletions(-)
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index 5d335c7..987aceb 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -1918,25 +1918,6 @@ _bt_start_vacuum(Relation rel)
if (result == 0 || result > MAX_BT_CYCLE_ID)
result = btvacinfo->cycle_ctr = 1;
- /* Let's just make sure there's no entry already for this index */
- for (i = 0; i < btvacinfo->num_vacuums; i++)
- {
- vac = &btvacinfo->vacuums[i];
- if (vac->relid.relId == rel->rd_lockInfo.lockRelId.relId &&
- vac->relid.dbId == rel->rd_lockInfo.lockRelId.dbId)
- {
- /*
- * Unlike most places in the backend, we have to explicitly
- * release our LWLock before throwing an error. This is because
- * we expect _bt_end_vacuum() to be called before transaction
- * abort cleanup can run to release LWLocks.
- */
- LWLockRelease(BtreeVacuumLock);
- elog(ERROR, "multiple active vacuums for index \"%s\"",
- RelationGetRelationName(rel));
- }
- }
-
/* OK, add an entry */
if (btvacinfo->num_vacuums >= btvacinfo->max_vacuums)
{
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0563e63..1562773 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -58,6 +58,7 @@ int vacuum_freeze_min_age;
int vacuum_freeze_table_age;
int vacuum_multixact_freeze_min_age;
int vacuum_multixact_freeze_table_age;
+int parallel_vacuum_workers;
/* A few variables that don't seem worth passing around as parameters */
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index 231e92d..4fc880d 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -42,8 +42,10 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
+#include "access/parallel.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
+#include "access/xact.h"
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/storage.h"
@@ -98,33 +100,9 @@
*/
#define SKIP_PAGES_THRESHOLD ((BlockNumber) 32)
-typedef struct LVRelStats
-{
- /* hasindex = true means two-pass strategy; false means one-pass */
- bool hasindex;
- /* Overall statistics about rel */
- BlockNumber old_rel_pages; /* previous value of pg_class.relpages */
- BlockNumber rel_pages; /* total number of pages */
- BlockNumber scanned_pages; /* number of pages we examined */
- BlockNumber pinskipped_pages; /* # of pages we skipped due to a pin */
- BlockNumber frozenskipped_pages; /* # of frozen pages we skipped */
- double scanned_tuples; /* counts only tuples on scanned pages */
- double old_rel_tuples; /* previous value of pg_class.reltuples */
- double new_rel_tuples; /* new estimated total # of tuples */
- double new_dead_tuples; /* new estimated total # of dead tuples */
- BlockNumber pages_removed;
- double tuples_deleted;
- BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
- /* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- int num_dead_tuples; /* current # of entries */
- int max_dead_tuples; /* # slots allocated in array */
- ItemPointer dead_tuples; /* array of ItemPointerData */
- int num_index_scans;
- TransactionId latestRemovedXid;
- bool lock_waiter_detected;
-} LVRelStats;
-
+/* DSM key for block-level parallel vacuum */
+#define VACUUM_KEY_TASK 50
+#define VACUUM_KEY_WORKER_STATS 51
/* A few variables that don't seem worth passing around as parameters */
static int elevel = -1;
@@ -137,9 +115,6 @@ static BufferAccessStrategy vac_strategy;
/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
- LVRelStats *vacrelstats, Relation *Irel, int nindexes,
- bool aggressive);
static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
static void lazy_vacuum_index(Relation indrel,
@@ -162,6 +137,18 @@ static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
+/* functions for parallel vacuum */
+static void parallel_lazy_scan_heap(Relation rel, LVRelStats *vacrelstats,
+ Relation *Irel, int nindexes, int options,
+ bool aggressive, int wnum);
+static void vacuum_worker(dsm_segment *seg, shm_toc *toc);
+static void lazy_scan_heap(Relation onerel, int options,
+ LVRelStats *vacrelstats, Relation *Irel,
+ int nindexes, bool aggressive,
+ BlockNumber begin, BlockNumber nblocks);
+static void gather_vacuum_stats(LVRelStats *valrelstats, LVRelStats *worker_stats,
+ int wnum);
+
/*
* lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
@@ -248,8 +235,17 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
vacrelstats->hasindex = (nindexes > 0);
- /* Do the vacuuming */
- lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
+ /* Do the parallel vacuuming. */
+ if (parallel_vacuum_workers > 1)
+ parallel_lazy_scan_heap(onerel, vacrelstats, Irel, nindexes, options,
+ aggressive, parallel_vacuum_workers);
+ else
+ {
+ BlockNumber nblocks = RelationGetNumberOfBlocks(onerel);
+
+ lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes,
+ aggressive, 0, nblocks);
+ }
/* Done with indexes */
vac_close_indexes(nindexes, Irel, NoLock);
@@ -428,7 +424,132 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
}
/*
- * lazy_scan_heap() -- scan an open heap relation
+ * Launch parallel vacuum workers specified by vacuum_parallel_workers and then
+ * gather the result stats of each workers. The idea of vacuuming one relation
+ * with multiple workers parallely is that each worker is assigned particlar block
+ * range of relation which is calculated using by parallel_vacuum_workers and
+ * the number of relation blocks. The informations and some threshoulds (e.g.
+ * OldestXmin, FreezeLimit, MultiXactCufoff) are stored into DSM tagged by
+ * VACUUM_KEY_TASK. Each worker can collect the garbage tid and reclaims them as
+ * well. Vacuum statistics for each workers are stored into DSm tagged by
+ * VACUUM_KEY_WORKER_STATS, that will be gathered by the leader process after all
+ * worker finished its task.
+ */
+static void
+parallel_lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
+ Relation *Irel, int nindexes, int options,
+ bool aggressive, int wnum)
+{
+ ParallelContext *pcxt;
+ LVRelStats *wstats_space;
+ VacuumTask *task_space;
+ IndexBulkDeleteResult **indstats;
+ int size = 0;
+ int i;
+
+ EnterParallelMode();
+
+ /* Create parallel context and initialize it */
+ pcxt = CreateParallelContext(vacuum_worker, wnum);
+ size += BUFFERALIGN(sizeof(VacuumTask)); /* For task */
+ size += BUFFERALIGN(sizeof(LVRelStats) * pcxt->nworkers); /* For worker stats */
+ shm_toc_estimate_chunk(&pcxt->estimator, size);
+ shm_toc_estimate_keys(&pcxt->estimator, 2);
+ InitializeParallelDSM(pcxt);
+
+ /* Prepare for VacuumTask space */
+ task_space = (VacuumTask *)shm_toc_allocate(pcxt->toc, sizeof(VacuumTask));
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_TASK, task_space);
+ task_space->relid = RelationGetRelid(onerel);
+ task_space->aggressive = aggressive;
+ task_space->options = options;
+ task_space->oldestxmin = OldestXmin;
+ task_space->freezelimit = FreezeLimit;
+ task_space->multixactcutoff = MultiXactCutoff;
+ task_space->wnum = wnum;
+ task_space->elevel = elevel;
+
+ /* Prepare for worker LVRelStats space */
+ wstats_space = (LVRelStats *)shm_toc_allocate(pcxt->toc,
+ sizeof(LVRelStats) * pcxt->nworkers);
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_WORKER_STATS, wstats_space);
+ for (i = 0; i < pcxt->nworkers; i++)
+ {
+ LVRelStats *wstats = wstats_space + sizeof(LVRelStats) * i;
+ memcpy(wstats, vacrelstats, sizeof(LVRelStats));
+ }
+
+ /* Do parallel vacuum */
+ LaunchParallelWorkers(pcxt);
+
+ /* Wait for workers finising vacuuming */
+ WaitForParallelWorkersToFinish(pcxt);
+ gather_vacuum_stats(vacrelstats, wstats_space, wnum);
+
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+
+ indstats = (IndexBulkDeleteResult **)
+ palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
+
+ /* Do post-vacuum cleanup and statistics update for each index */
+ for (i = 0; i < nindexes; i++)
+ lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+}
+
+/*
+ * Entry function for parallel vacuum worker. Each worker calculates the
+ * starting block number and number of blocks need to process, and then
+ * does vacuuming particular block range of relation.
+ */
+static void
+vacuum_worker(dsm_segment *seg, shm_toc *toc)
+{
+ VacuumTask *task;
+ LVRelStats *wstats_space;
+ LVRelStats *wstats;
+ Relation rel;
+ BlockNumber begin;
+ BlockNumber nblocks_per_worker;
+ BlockNumber nblocks;
+ int nindexes;
+ Relation *Irel;
+
+ /* Set up task information */
+ task = (VacuumTask *)shm_toc_lookup(toc, VACUUM_KEY_TASK);
+ OldestXmin = task->oldestxmin;
+ FreezeLimit = task->freezelimit;
+ MultiXactCutoff = task->multixactcutoff;
+
+ /* Set up message queue */
+ wstats_space = (LVRelStats *)shm_toc_lookup(toc, VACUUM_KEY_WORKER_STATS);
+ wstats = wstats_space + sizeof(LVRelStats) * ParallelWorkerNumber;
+
+ /* Calculate how many blocks the worker should process */
+ rel = heap_open(task->relid, NoLock);
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes, &Irel);
+ nblocks_per_worker = RelationGetNumberOfBlocks(rel) / parallel_vacuum_workers;
+ begin = nblocks_per_worker * ParallelWorkerNumber;
+
+ /* The last worker processes remaining blocks */
+ if (ParallelWorkerNumber == (task->wnum - 1))
+ nblocks = RelationGetNumberOfBlocks(rel) - begin;
+ else
+ nblocks = nblocks_per_worker;
+
+ /* Set up elevel */
+ elevel = task->elevel;
+
+ /* Do vacuuming particular area */
+ lazy_scan_heap(rel, task->options, wstats, Irel, nindexes,
+ task->aggressive, begin, nblocks);
+
+ heap_close(rel, NoLock);
+ vac_close_indexes(nindexes, Irel, NoLock);
+}
+
+/*
+ * lazy_scan_heap() -- scan paritclar range of open heap relation
*
* This routine prunes each page in the heap, which will among other
* things truncate dead tuples to dead line pointers, defragment the
@@ -445,10 +566,10 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
*/
static void
lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
- Relation *Irel, int nindexes, bool aggressive)
+ Relation *Irel, int nindexes, bool aggressive,
+ BlockNumber begin, BlockNumber nblocks)
{
- BlockNumber nblocks,
- blkno;
+ BlockNumber blkno;
HeapTupleData tuple;
char *relname;
BlockNumber empty_pages,
@@ -471,14 +592,15 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
PROGRESS_VACUUM_MAX_DEAD_TUPLES
};
int64 initprog_val[3];
+ BlockNumber end = begin + nblocks;
pg_rusage_init(&ru0);
relname = RelationGetRelationName(onerel);
ereport(elevel,
- (errmsg("vacuuming \"%s.%s\"",
+ (errmsg("vacuuming \"%s.%s\", from block %u to %u, %u blocks",
get_namespace_name(RelationGetNamespace(onerel)),
- relname)));
+ relname, begin, end, nblocks)));
empty_pages = vacuumed_pages = 0;
num_tuples = tups_vacuumed = nkeep = nunused = 0;
@@ -486,7 +608,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
indstats = (IndexBulkDeleteResult **)
palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
- nblocks = RelationGetNumberOfBlocks(onerel);
vacrelstats->rel_pages = nblocks;
vacrelstats->scanned_pages = 0;
vacrelstats->nonempty_pages = 0;
@@ -545,10 +666,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
* the last page. This is worth avoiding mainly because such a lock must
* be replayed on any hot standby, where it can be disruptive.
*/
- next_unskippable_block = 0;
+ next_unskippable_block = begin;
if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
{
- while (next_unskippable_block < nblocks)
+ while (next_unskippable_block < end)
{
uint8 vmstatus;
@@ -574,7 +695,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
else
skipping_blocks = false;
- for (blkno = 0; blkno < nblocks; blkno++)
+ for (blkno = begin; blkno < end; blkno++)
{
Buffer buf;
Page page;
@@ -1306,10 +1427,6 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_INDEX_CLEANUP);
- /* Do post-vacuum cleanup and statistics update for each index */
- for (i = 0; i < nindexes; i++)
- lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
-
/* If no indexes, make log report that lazy_vacuum_heap would've made */
if (vacuumed_pages)
ereport(elevel,
@@ -1317,6 +1434,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
RelationGetRelationName(onerel),
tups_vacuumed, vacuumed_pages)));
+ /* Do post-vacuum cleanup and statistics update for each index */
+ if (!IsParallelWorker())
+ {
+ for (i = 0; i < nindexes; i++)
+ lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+ }
+
/*
* This is pretty messy, but we split it up so that we can skip emitting
* individual parts of the message when not applicable.
@@ -1347,6 +1471,29 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
pfree(buf.data);
}
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+gather_vacuum_stats(LVRelStats *vacrelstats, LVRelStats *worker_stats, int wnum)
+{
+ int i;
+
+ /* Gather each worker stats */
+ for (i = 0; i < wnum; i++)
+ {
+ LVRelStats *wstats = worker_stats + sizeof(LVRelStats) * i;
+
+ vacrelstats->rel_pages += wstats->rel_pages;
+ vacrelstats->scanned_pages += wstats->scanned_pages;
+ vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+ vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+ vacrelstats->scanned_tuples += wstats->scanned_tuples;
+ vacrelstats->new_rel_tuples += wstats->new_rel_tuples;
+ vacrelstats->pages_removed += wstats->pages_removed;
+ vacrelstats->nonempty_pages += wstats->nonempty_pages;
+ }
+}
/*
* lazy_vacuum_heap() -- second pass over the heap
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index c5178f7..0dd64bc 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2661,6 +2661,16 @@ static struct config_int ConfigureNamesInt[] =
},
{
+ {"parallel_vacuum_workers", PGC_USERSET, RESOURCES_ASYNCHRONOUS,
+ gettext_noop("Sets the number of parallel worker for vacuum."),
+ NULL
+ },
+ ¶llel_vacuum_workers,
+ 1, 1, 1024,
+ NULL, NULL, NULL
+ },
+
+ {
{"autovacuum_work_mem", PGC_SIGHUP, RESOURCES_MEM,
gettext_noop("Sets the maximum memory to be used by each autovacuum worker process."),
NULL,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 80cd4a8..fc46c09 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -147,6 +147,45 @@ typedef struct VacuumParams
* activated, -1 to use default */
} VacuumParams;
+typedef struct LVRelStats
+{
+ /* hasindex = true means two-pass strategy; false means one-pass */
+ bool hasindex;
+ /* Overall statistics about rel */
+ BlockNumber old_rel_pages; /* previous value of pg_class.relpages */
+ BlockNumber rel_pages; /* total number of pages */
+ BlockNumber scanned_pages; /* number of pages we examined */
+ BlockNumber pinskipped_pages; /* # of pages we skipped due to a pin */
+ BlockNumber frozenskipped_pages; /* # of frozen pages we skipped */
+ double scanned_tuples; /* counts only tuples on scanned pages */
+ double old_rel_tuples; /* previous value of pg_class.reltuples */
+ double new_rel_tuples; /* new estimated total # of tuples */
+ double new_dead_tuples; /* new estimated total # of dead tuples */
+ BlockNumber pages_removed;
+ double tuples_deleted;
+ BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
+ /* List of TIDs of tuples we intend to delete */
+ /* NB: this list is ordered by TID address */
+ int num_dead_tuples; /* current # of entries */
+ int max_dead_tuples; /* # slots allocated in array */
+ ItemPointer dead_tuples; /* array of ItemPointerData */
+ int num_index_scans;
+ TransactionId latestRemovedXid;
+ bool lock_waiter_detected;
+} LVRelStats;
+
+typedef struct VacuumTask
+{
+ Oid relid; /* Target relation oid */
+ bool aggressive; /* does each worker need to aggressive vacuum? */
+ int options; /* Specified vacuum options */
+ TransactionId oldestxmin;
+ TransactionId freezelimit;
+ MultiXactId multixactcutoff;
+ int wnum;
+ int elevel;
+} VacuumTask;
+
/* GUC parameters */
extern PGDLLIMPORT int default_statistics_target; /* PGDLLIMPORT for
* PostGIS */
@@ -154,7 +193,7 @@ extern int vacuum_freeze_min_age;
extern int vacuum_freeze_table_age;
extern int vacuum_multixact_freeze_min_age;
extern int vacuum_multixact_freeze_table_age;
-
+extern int parallel_vacuum_workers;
/* in commands/vacuum.c */
extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
diff --git a/src/include/storage/buf_internals.h b/src/include/storage/buf_internals.h
index 90fcbd7..4f9d986 100644
--- a/src/include/storage/buf_internals.h
+++ b/src/include/storage/buf_internals.h
@@ -15,6 +15,7 @@
#ifndef BUFMGR_INTERNALS_H
#define BUFMGR_INTERNALS_H
+#include "lib/ilist.h"
#include "storage/buf.h"
#include "storage/bufmgr.h"
#include "storage/latch.h"
--
2.8.1
I repeat your test on ProLiant DL580 Gen9 with Xeon E7-8890 v3.
pgbench -s 100 and command vacuum pgbench_acounts after 10_000 transactions:
with: alter system set vacuum_cost_delay to DEFAULT;
parallel_vacuum_workers | time
1 | 138.703,263 ms
2 | 83.751,064 ms
4 | 66.105,861 ms
8 | 59.820,171 ms
with: alter system set vacuum_cost_delay to 1;
parallel_vacuum_workers | time
1 | 127.210,896 ms
2 | 75.300,278 ms
4 | 64.253,087 ms
8 | 60.130,953
---
Dmitry Vasilyev
Postgres Professional: http://www.postgrespro.ru
The Russian Postgres Company
2016-08-23 14:02 GMT+03:00 Masahiko Sawada <sawada.mshk@gmail.com>:
Show quoted text
Hi all,
I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.Vacuum Processing Logic
===================PostgreSQL VACUUM processing logic consists of 2 phases,
1. Collecting dead tuple locations on heap.
2. Reclaiming dead tuples from heap and indexes.
These phases 1 and 2 are executed alternately, and once amount of dead
tuple location reached maintenance_work_mem in phase 1, phase 2 will
be executed.Basic Design
==========As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.Performance(PoC)
=========I ran parallel vacuum on 13GB table (pgbench scale 1000) with several
workers (on my poor virtual machine).
The result is,1. Vacuum whole table without index (disable page skipping)
1 worker : 33 sec
2 workers : 27 sec
3 workers : 23 sec
4 workers : 22 sec2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 secAs a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).Another Design
============
ISTM that processing index vacuum by multiple process is not good idea
in most cases because many index items can be stored in a page and
multiple vacuum worker could try to require the cleanup lock on the
same index buffer.
It's rather better that multiple workers process particular block
range and then multiple workers process each particular block range,
and then one worker per index processes index vacuum.Still lots of work to do but attached PoC patch.
Feedback and suggestion are very welcome.Regards,
--
Masahiko Sawada--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 8:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
2. Vacuum table and index (after 10000 transaction executed)
1 worker : 12 sec
2 workers : 49 sec
3 workers : 54 sec
4 workers : 53 secAs a result of my test, since multiple process could frequently try to
acquire the cleanup lock on same index buffer, execution time of
parallel vacuum got worse.
And it seems to be effective for only table vacuum so far, but is not
improved as expected (maybe disk bottleneck).
Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Claudio Freire <klaussfreire@gmail.com> writes:
Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).
What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.
regards, tom lane
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Claudio Freire <klaussfreire@gmail.com> writes:
Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.
+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 23.08.2016 15:41, Michael Paquier wrote:
On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.
Rotating disks is not a problem - you can always raid them and etc. 8k
allocation per relation once per half an hour that is the problem. Seq
scan is this way = random scan...
Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.
The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.
Great. This is something that I have thought about, too. Andres and
Heikki recommended it as a project to me a few PGCons ago.
As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.
This doesn't seem like a good design, because it adds a lot of extra
index scanning work. What I think you should do is:
1. Use a parallel heap scan (heap_beginscan_parallel) to let all
workers scan in parallel. Allocate a DSM segment to store the control
structure for this parallel scan plus an array for the dead tuple IDs
and a lock to protect the array.
2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.
Later, we can try to see if there's a way to have multiple workers
work together to vacuum a single index. But the above seems like a
good place to start.
I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.
You won't need this if you proceed as above, which is probably a good thing.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.
I suspect that for autovacuum there is little reason to use parallel
vacuum, since most of the time we are trying to slow vacuum down, not
speed it up. I'd be inclined, for starters, to just add a PARALLEL
option to the VACUUM command, for when people want to speed up
parallel vacuums. Perhaps
VACUUM (PARALLEL 4) relation;
...could mean to vacuum the relation with the given number of workers, and:
VACUUM (PARALLEL) relation;
...could mean to vacuum the relation in parallel with the system
choosing the number of workers - 1 worker per index is probably a good
starting formula, though it might need some refinement.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 9:40 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
On Tue, Aug 23, 2016 at 3:32 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Claudio Freire <klaussfreire@gmail.com> writes:
Not only that, but from your description (I haven't read the patch,
sorry), you'd be scanning the whole index multiple times (one per
worker).What about pointing each worker at a separate index? Obviously the
degree of concurrency during index cleanup is then limited by the
number of indexes, but that doesn't seem like a fatal problem.+1
We could eventually need some effective way of parallelizing vacuum of
single index.
But pointing each worker at separate index seems to be fair enough for
majority of cases.
Or we can improve vacuum of single index by changing data
representation of dead tuple to bitmap.
It can reduce the number of index whole scan during vacuum and make
comparing the index item to the dead tuples faster.
This is a listed on Todo list and I've implemented it.
Regards,
--
Masahiko Sawada
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote:
2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.
At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 10:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Aug 23, 2016 at 7:02 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'd like to propose block level parallel VACUUM.
This feature makes VACUUM possible to use multiple CPU cores.Great. This is something that I have thought about, too. Andres and
Heikki recommended it as a project to me a few PGCons ago.As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).
To use visibility map efficiency, each worker scan particular block
range of relation and collect dead tuple locations.
After each worker finished task, the leader process gathers these
vacuum statistics information and update relfrozenxid if possible.This doesn't seem like a good design, because it adds a lot of extra
index scanning work. What I think you should do is:1. Use a parallel heap scan (heap_beginscan_parallel) to let all
workers scan in parallel. Allocate a DSM segment to store the control
structure for this parallel scan plus an array for the dead tuple IDs
and a lock to protect the array.2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.Later, we can try to see if there's a way to have multiple workers
work together to vacuum a single index. But the above seems like a
good place to start.
Thank you for the advice.
That's a what I thought as an another design, I will change the patch
to this design.
I also changed the buffer lock infrastructure so that multiple
processes can wait for cleanup lock on a buffer.You won't need this if you proceed as above, which is probably a good thing.
Right.
And the new GUC parameter vacuum_parallel_workers controls the number
of vacuum workers.I suspect that for autovacuum there is little reason to use parallel
vacuum, since most of the time we are trying to slow vacuum down, not
speed it up. I'd be inclined, for starters, to just add a PARALLEL
option to the VACUUM command, for when people want to speed up
parallel vacuums. PerhapsVACUUM (PARALLEL 4) relation;
...could mean to vacuum the relation with the given number of workers, and:
VACUUM (PARALLEL) relation;
...could mean to vacuum the relation in parallel with the system
choosing the number of workers - 1 worker per index is probably a good
starting formula, though it might need some refinement.
It looks convenient.
I was thinking that we can manage the number of parallel worker per
table using this parameter for autovacuum , like
ALTER TABLE relation SET (parallel_vacuum_workers = 2)
Regards,
--
Masahiko Sawada
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
Robert Haas wrote:
2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.
Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Robert Haas wrote:
On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Robert Haas wrote:
2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.
Sure, just putting the idea out there.
--
�lvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2016-08-23 12:17:30 -0400, Robert Haas wrote:
On Tue, Aug 23, 2016 at 11:17 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:Robert Haas wrote:
2. When you finish the heap scan, or when the array of dead tuple IDs
is full (or very nearly full?), perform a cycle of index vacuuming.
For now, have each worker process a separate index; extra workers just
wait. Perhaps use the condition variable patch that I posted
previously to make the workers wait. Then resume the parallel heap
scan, if not yet done.At least btrees should easily be scannable in parallel, given that we
process them in physical order rather than logically walk the tree. So
if there are more workers than indexes, it's possible to put more than
one worker on the same index by carefully indicating each to stop at a
predetermined index page number.Well that's fine if we figure it out, but I wouldn't try to include it
in the first patch. Let's make VACUUM parallel one step at a time.
Given that index scan(s) are, in my experience, way more often the
bottleneck than the heap-scan(s), I'm not sure that order is the
best. The heap-scan benefits from the VM, the index scans don't.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.
Ah, so it was the reason. Thanks for confirming my doubts on what is proposed.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:
On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada <sawada.mshk@gmail.com>
wrote:
As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.Ah, so it was the reason. Thanks for confirming my doubts on what is
proposed.
--
I believe Sawada-san has got enough feedback on the design to work out the
next steps. It seems natural that the vacuum workers are assigned a portion
of the heap to scan and collect dead tuples (similar to what patch does)
and the same workers to be responsible for the second phase of heap scan.
But as far as index scans are concerned, I agree with Tom that the best
strategy is to assign a different index to each worker process and let them
vacuum indexes in parallel. That way the work for each worker process is
clearly cut out and they don't contend for the same resources, which means
the first patch to allow multiple backends to wait for a cleanup buffer is
not required. Later we could extend it further such multiple workers can
vacuum a single index by splitting the work on physical boundaries, but
even that will ensure clear demarkation of work and hence no contention on
index blocks.
ISTM this will require further work and it probably makes sense to mark the
patch as "Returned with feedback" for now.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Sep 10, 2016 at 7:44 PM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
On Wed, Aug 24, 2016 at 3:31 AM, Michael Paquier <michael.paquier@gmail.com>
wrote:On Tue, Aug 23, 2016 at 10:50 PM, Amit Kapila <amit.kapila16@gmail.com>
wrote:On Tue, Aug 23, 2016 at 6:11 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Tue, Aug 23, 2016 at 8:02 PM, Masahiko Sawada
<sawada.mshk@gmail.com> wrote:As for PoC, I implemented parallel vacuum so that each worker
processes both 1 and 2 phases for particular block range.
Suppose we vacuum 1000 blocks table with 4 workers, each worker
processes 250 consecutive blocks in phase 1 and then reclaims dead
tuples from heap and indexes (phase 2).So each worker is assigned a range of blocks, and processes them in
parallel? This does not sound performance-wise. I recall Robert and
Amit emails on the matter for sequential scan that this would suck
performance out particularly for rotating disks.The implementation in patch is same as we have initially thought for
sequential scan, but turned out that it is not good way to do because
it can lead to inappropriate balance of work among workers. Suppose
one worker is able to finish it's work, it won't be able to do more.Ah, so it was the reason. Thanks for confirming my doubts on what is
proposed.
--I believe Sawada-san has got enough feedback on the design to work out the
next steps. It seems natural that the vacuum workers are assigned a portion
of the heap to scan and collect dead tuples (similar to what patch does) and
the same workers to be responsible for the second phase of heap scan.
Yeah, thank you for the feedback.
But as far as index scans are concerned, I agree with Tom that the best
strategy is to assign a different index to each worker process and let them
vacuum indexes in parallel.
That way the work for each worker process is
clearly cut out and they don't contend for the same resources, which means
the first patch to allow multiple backends to wait for a cleanup buffer is
not required. Later we could extend it further such multiple workers can
vacuum a single index by splitting the work on physical boundaries, but even
that will ensure clear demarkation of work and hence no contention on index
blocks.
I also agree with this idea.
Each worker vacuums different indexes and then the leader process
should update all index statistics after parallel mode exited.
I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.
Hmm, yeah. This is one of the reasons why parallel queries currently
need to be entirely read-only. I think there's a decent argument that
the relation extension lock mechanism should be entirely redesigned:
the current system is neither particularly fast nor particularly
elegant, and some of the services that the heavyweight lock manager
provides, such as deadlock detection, are not relevant for relation
extension locks. I'm not sure if we should try to fix that right away
or come up with some special-purpose hack for vacuum, such as having
backends use condition variables to take turns calling
visibilitymap_set().
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Thu, Sep 15, 2016 at 11:44 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Sep 15, 2016 at 7:21 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I'm implementing this patch but I need to resolve the problem
regarding lock for extension by multiple parallel workers.
In parallel vacuum, multiple workers could try to acquire the
exclusive lock for extension on same relation.
Since acquiring the exclusive lock for extension by multiple workers
is regarded as locking from same locking group, multiple workers
extend fsm or vm at the same time and end up with error.
I thought that it might be involved with parallel update operation, so
I'd like to discuss about this in advance.Hmm, yeah. This is one of the reasons why parallel queries currently
need to be entirely read-only. I think there's a decent argument that
the relation extension lock mechanism should be entirely redesigned:
the current system is neither particularly fast nor particularly
elegant, and some of the services that the heavyweight lock manager
provides, such as deadlock detection, are not relevant for relation
extension locks. I'm not sure if we should try to fix that right away
or come up with some special-purpose hack for vacuum, such as having
backends use condition variables to take turns calling
visibilitymap_set().
Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.
Marked as returned with feedback because of lack of activity and...
Feedback provided.
--
Michael
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.
I got some advices at PGConf.ASIA 2016 and started to work on this again.
The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page. I'm not sure this is the
best idea but it's very simple and enough to support parallel vacuum.
More smart idea could be needed when we want to support parallel DML
and so on.
001 patch adds parallel option to VACUUM command. As Robert suggested
before, parallel option is set with parallel degree.
=# VACUUM (PARALLEL 4) table_name;
..means 4 background processes are launched and background process
executes lazy_scan_heap while the launcher (leader) process waits for
all vacuum workers finish. If N = 1 or without parallel option, leader
process itself executes lazy_scan_heap.
Internal Design
=============
I changed the parallel vacuum internal design. Collecting garbage on
table is processed in block-level parallel. For tables with indexes,
each index on table is assigned to each vacuum worker and all garbage
on a index are processed by particular assigned vacuum worker.
The all spaces for the array of dead tuple TIDs used by vacuum worker
are allocated in dynamic shared memory by launcher process. Vacuum
worker process stores dead tuple location into its dead tuple array
without lock, the TIDs in a dead tuple array are ordered by TID. Note
that entire space for dead tuple, that is a bunch of dead tuple array,
are not ordered.
If table has index, all dead tuple TIDs needs to be shared with all
vacuum workers before actual reclaiming dead tuples starts and these
data should be cleared after all vacuum worker finished to use them.
So I put two synchronization points at where before reclaiming dead
tuples and where after finished to reclaim them. At these points,
parallel vacuum worker waits for all other workers to reach to the
same point. Once all vacuum workers reached to same point, vacuum
worker resumes next operation.
For example, If a table has five indexes and we execute parallel lazy
vacuum on that table with three vacuum workers, two of three vacuum
workers are assigned two indexes and another one vacuum worker is
assigned to one indexes. After the amount of dead tuple of all vacuum
worker reached to maintenance_work_mem size vacuum worker starts to
reclaim dead tuple on table and index. A vacuum worker that is
assigned to one index finishes (probably first) and sleeps until other
two vacuum workers finish to vacuum. If table has no index then each
parallel vacuum worker vacuums each page as we go.
Performance
===========
I measured the execution time of vacuum on dirty table with several
parallel degree in my poor environment.
table_size | indexes | parallel_degree | time
------------+---------+-----------------+----------
6.5GB | 0 | 1 | 00:00:14
6.5GB | 0 | 2 | 00:00:02
6.5GB | 0 | 4 | 00:00:02
6.5GB | 1 | 1 | 00:00:13
6.5GB | 1 | 2 | 00:00:15
6.5GB | 1 | 4 | 00:00:18
6.5GB | 2 | 1 | 00:02:18
6.5GB | 2 | 2 | 00:00:38
6.5GB | 2 | 4 | 00:00:46
13GB | 0 | 1 | 00:03:52
13GB | 0 | 2 | 00:00:49
13GB | 0 | 4 | 00:00:50
13GB | 1 | 1 | 00:01:41
13GB | 1 | 2 | 00:01:59
13GB | 1 | 4 | 00:01:24
13GB | 2 | 1 | 00:12:42
13GB | 2 | 2 | 00:01:17
13GB | 2 | 4 | 00:02:12
In result of my measurement, vacuum execution time got better in some
cases but didn't improve in case where index = 1. I'll investigate the
cause.
ToDo
======
* Vacuum progress support.
* Storage parameter support, perhaps parallel_vacuum_workers parameter
which allows autovacuum to use parallel vacuum on specified table.
I register this to next CF.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
000_make_group_locking_conflict_extend_lock_v2.patchapplication/octet-stream; name=000_make_group_locking_conflict_extend_lock_v2.patchDownload
diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c
index e9703f1..dd27acf 100644
--- a/src/backend/storage/lmgr/lock.c
+++ b/src/backend/storage/lmgr/lock.c
@@ -1354,6 +1354,17 @@ LockCheckConflicts(LockMethod lockMethodTable,
}
/*
+ * If relation lock for extend, it's a conflict even in
+ * group locking.
+ */
+ if ((lock->tag).locktag_type == LOCKTAG_RELATION_EXTEND)
+ {
+ PROCLOCK_PRINT("LockCheckConflicts: conflicting (group)",
+ proclock);
+ return STATUS_FOUND;
+ }
+
+ /*
* Locks held in conflicting modes by members of our own lock group are
* not real conflicts; we can subtract those out and see if we still have
* a conflict. This is O(N) in the number of processes holding or
001_parallel_vacuum_v2.patchapplication/octet-stream; name=001_parallel_vacuum_v2.patchDownload
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index f18180a..8f1dc7b 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
<refsynopsisdiv>
<synopsis>
-VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
+VACUUM [ ( { FULL | FREEZE | VERBOSE | ANALYZE | PARALLEL <replaceable class="PARAMETER">N</replaceable> | DISABLE_PAGE_SKIPPING } [, ...] ) ] [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ <replaceable class="PARAMETER">table_name</replaceable> ]
VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">table_name</replaceable> [ (<replaceable class="PARAMETER">column_name</replaceable> [, ...] ) ] ]
</synopsis>
@@ -130,6 +130,20 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] ANALYZE [ <replaceable class="PARAMETER">
</varlistentry>
<varlistentry>
+ <term><literal>PARALLEL <replaceable class="PARAMETER">N</replaceable></literal></term>
+ <listitem>
+ <para>
+ Execute <command>VACUUM</command> in parallel by <replaceable class="PARAMETER">N
+ </replaceable> background workers. Collecting garbage on table is processed
+ in block-level parallel. For tables with indexes, parallel vacuum assigns each
+ index to each parallel vacuum worker and all garbages on a index are processed
+ by particular parallel vacuum worker. This option can not use with <literal>FULL</>
+ option.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
<term><literal>DISABLE_PAGE_SKIPPING</literal></term>
<listitem>
<para>
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1ce42ea..ff3f8d8 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -88,7 +88,6 @@ static HeapScanDesc heap_beginscan_internal(Relation relation,
bool is_bitmapscan,
bool is_samplescan,
bool temp_snap);
-static BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
static HeapTuple heap_prepare_insert(Relation relation, HeapTuple tup,
TransactionId xid, CommandId cid, int options);
static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
@@ -1670,7 +1669,7 @@ heap_beginscan_parallel(Relation relation, ParallelHeapScanDesc parallel_scan)
* first backend gets an InvalidBlockNumber return.
* ----------------
*/
-static BlockNumber
+BlockNumber
heap_parallelscan_nextpage(HeapScanDesc scan)
{
BlockNumber page = InvalidBlockNumber;
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index 0f72c1c..bfcb77a 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -71,7 +71,7 @@ static void vac_truncate_clog(TransactionId frozenXID,
MultiXactId minMulti,
TransactionId lastSaneFrozenXid,
MultiXactId lastSaneMinMulti);
-static bool vacuum_rel(Oid relid, RangeVar *relation, int options,
+static bool vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options,
VacuumParams *params);
/*
@@ -86,17 +86,17 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
VacuumParams params;
/* sanity checks on options */
- Assert(vacstmt->options & (VACOPT_VACUUM | VACOPT_ANALYZE));
- Assert((vacstmt->options & VACOPT_VACUUM) ||
- !(vacstmt->options & (VACOPT_FULL | VACOPT_FREEZE)));
- Assert((vacstmt->options & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
- Assert(!(vacstmt->options & VACOPT_SKIPTOAST));
+ Assert(vacstmt->options.flags & (VACOPT_VACUUM | VACOPT_ANALYZE));
+ Assert((vacstmt->options.flags & VACOPT_VACUUM) ||
+ !(vacstmt->options.flags & (VACOPT_FULL | VACOPT_FREEZE)));
+ Assert((vacstmt->options.flags & VACOPT_ANALYZE) || vacstmt->va_cols == NIL);
+ Assert(!(vacstmt->options.flags & VACOPT_SKIPTOAST));
/*
* All freeze ages are zero if the FREEZE option is given; otherwise pass
* them as -1 which means to use the default values.
*/
- if (vacstmt->options & VACOPT_FREEZE)
+ if (vacstmt->options.flags & VACOPT_FREEZE)
{
params.freeze_min_age = 0;
params.freeze_table_age = 0;
@@ -145,7 +145,7 @@ ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel)
* memory context that will not disappear at transaction commit.
*/
void
-vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
+vacuum(VacuumOptions options, RangeVar *relation, Oid relid, VacuumParams *params,
List *va_cols, BufferAccessStrategy bstrategy, bool isTopLevel)
{
const char *stmttype;
@@ -156,7 +156,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
Assert(params != NULL);
- stmttype = (options & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
+ stmttype = (options.flags & VACOPT_VACUUM) ? "VACUUM" : "ANALYZE";
/*
* We cannot run VACUUM inside a user transaction block; if we were inside
@@ -166,7 +166,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
*
* ANALYZE (without VACUUM) can run either way.
*/
- if (options & VACOPT_VACUUM)
+ if (options.flags & VACOPT_VACUUM)
{
PreventTransactionChain(isTopLevel, stmttype);
in_outer_xact = false;
@@ -188,17 +188,26 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
/*
* Sanity check DISABLE_PAGE_SKIPPING option.
*/
- if ((options & VACOPT_FULL) != 0 &&
- (options & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
+ if ((options.flags & VACOPT_FULL) != 0 &&
+ (options.flags & VACOPT_DISABLE_PAGE_SKIPPING) != 0)
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
errmsg("VACUUM option DISABLE_PAGE_SKIPPING cannot be used with FULL")));
/*
+ * Sanity check PARALLEL option.
+ */
+ if ((options.flags & VACOPT_FULL) != 0 &&
+ (options.flags & VACOPT_PARALLEL) != 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("VACUUM option PARALLEL cannnot be used with FULL")));
+
+ /*
* Send info about dead objects to the statistics collector, unless we are
* in autovacuum --- autovacuum.c does this for itself.
*/
- if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+ if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
pgstat_vacuum_stat();
/*
@@ -244,11 +253,11 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
* transaction block, and also in an autovacuum worker, use own
* transactions so we can release locks sooner.
*/
- if (options & VACOPT_VACUUM)
+ if (options.flags & VACOPT_VACUUM)
use_own_xacts = true;
else
{
- Assert(options & VACOPT_ANALYZE);
+ Assert(options.flags & VACOPT_ANALYZE);
if (IsAutoVacuumWorkerProcess())
use_own_xacts = true;
else if (in_outer_xact)
@@ -298,13 +307,13 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
{
Oid relid = lfirst_oid(cur);
- if (options & VACOPT_VACUUM)
+ if (options.flags & VACOPT_VACUUM)
{
if (!vacuum_rel(relid, relation, options, params))
continue;
}
- if (options & VACOPT_ANALYZE)
+ if (options.flags & VACOPT_ANALYZE)
{
/*
* If using separate xacts, start one for analyze. Otherwise,
@@ -317,7 +326,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
PushActiveSnapshot(GetTransactionSnapshot());
}
- analyze_rel(relid, relation, options, params,
+ analyze_rel(relid, relation, options.flags, params,
va_cols, in_outer_xact, vac_strategy);
if (use_own_xacts)
@@ -353,7 +362,7 @@ vacuum(int options, RangeVar *relation, Oid relid, VacuumParams *params,
StartTransactionCommand();
}
- if ((options & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
+ if ((options.flags & VACOPT_VACUUM) && !IsAutoVacuumWorkerProcess())
{
/*
* Update pg_database.datfrozenxid, and truncate pg_clog if possible.
@@ -1183,7 +1192,7 @@ vac_truncate_clog(TransactionId frozenXID,
* At entry and exit, we are not inside a transaction.
*/
static bool
-vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
+vacuum_rel(Oid relid, RangeVar *relation, VacuumOptions options, VacuumParams *params)
{
LOCKMODE lmode;
Relation onerel;
@@ -1204,7 +1213,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
*/
PushActiveSnapshot(GetTransactionSnapshot());
- if (!(options & VACOPT_FULL))
+ if (!(options.flags & VACOPT_FULL))
{
/*
* In lazy vacuum, we can set the PROC_IN_VACUUM flag, which lets
@@ -1244,7 +1253,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
* vacuum, but just ShareUpdateExclusiveLock for concurrent vacuum. Either
* way, we can be sure that no other backend is vacuuming the same table.
*/
- lmode = (options & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
+ lmode = (options.flags & VACOPT_FULL) ? AccessExclusiveLock : ShareUpdateExclusiveLock;
/*
* Open the relation and get the appropriate lock on it.
@@ -1255,7 +1264,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
* If we've been asked not to wait for the relation lock, acquire it first
* in non-blocking mode, before calling try_relation_open().
*/
- if (!(options & VACOPT_NOWAIT))
+ if (!(options.flags & VACOPT_NOWAIT))
onerel = try_relation_open(relid, lmode);
else if (ConditionalLockRelationOid(relid, lmode))
onerel = try_relation_open(relid, NoLock);
@@ -1359,7 +1368,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
* us to process it. In VACUUM FULL, though, the toast table is
* automatically rebuilt by cluster_rel so we shouldn't recurse to it.
*/
- if (!(options & VACOPT_SKIPTOAST) && !(options & VACOPT_FULL))
+ if (!(options.flags & VACOPT_SKIPTOAST) && !(options.flags & VACOPT_FULL))
toast_relid = onerel->rd_rel->reltoastrelid;
else
toast_relid = InvalidOid;
@@ -1378,7 +1387,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
/*
* Do the actual work --- either FULL or "lazy" vacuum
*/
- if (options & VACOPT_FULL)
+ if (options.flags & VACOPT_FULL)
{
/* close relation before vacuuming, but hold lock until commit */
relation_close(onerel, NoLock);
@@ -1386,7 +1395,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
/* VACUUM FULL is now a variant of CLUSTER; see cluster.c */
cluster_rel(relid, InvalidOid, false,
- (options & VACOPT_VERBOSE) != 0);
+ (options.flags & VACOPT_VERBOSE) != 0);
}
else
lazy_vacuum_rel(onerel, options, params, vac_strategy);
@@ -1440,8 +1449,7 @@ vacuum_rel(Oid relid, RangeVar *relation, int options, VacuumParams *params)
* hit dangling index pointers.
*/
void
-vac_open_indexes(Relation relation, LOCKMODE lockmode,
- int *nindexes, Relation **Irel)
+vac_open_indexes(Relation relation, LOCKMODE lockmode, int *nindexes, Relation **Irel)
{
List *indexoidlist;
ListCell *indexoidscan;
diff --git a/src/backend/commands/vacuumlazy.c b/src/backend/commands/vacuumlazy.c
index a2999b3..b5e6eed 100644
--- a/src/backend/commands/vacuumlazy.c
+++ b/src/backend/commands/vacuumlazy.c
@@ -23,6 +23,22 @@
* of index scans performed. So we don't use maintenance_work_mem memory for
* the TID array, just enough to hold as many heap tuples as fit on one page.
*
+ * In PostgreSQL 10, we support parallel option for lazy vacuum. In parallel
+ * lazy vacuum the multiple vacuum worker processes get page number in parallel
+ * using parallel heap scan and process it. The memory spaces for the array
+ * of dead tuple TIDs of each worker are allocated in dynamic shared memory in
+ * advance by launcher process. The vacuum workers has its vacuum state and
+ * round. Since the vacuum state is the cyclical state the round value indicates
+ * how many laps vacuum worker did so far. Vacuum worker increments its
+ * round after finished the reclaim phase. For tables with indexes, each index
+ * on table is assigned to each vacuum worker. That is, the number of indexes
+ * assigned could be different between vacuum workers. Because the dead tuple
+ * TIDs need to be shared with all vacuum workers in order to reclaim index
+ * garbage and be cleared after all vacuum workers finished the reclaim phase,
+ * vacuum worker synchronizes other vacuum workers at two points where before
+ * reclaim phase begins and where after reclaim phase finished. After all of
+ * vacuum workers finished to work, the launcher process gathers the lazy vacuum
+ * statistics and update them.
*
* Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -42,8 +58,11 @@
#include "access/heapam_xlog.h"
#include "access/htup_details.h"
#include "access/multixact.h"
+#include "access/parallel.h"
+#include "access/relscan.h"
#include "access/transam.h"
#include "access/visibilitymap.h"
+#include "access/xact.h"
#include "access/xlog.h"
#include "catalog/catalog.h"
#include "catalog/storage.h"
@@ -55,6 +74,7 @@
#include "portability/instr_time.h"
#include "postmaster/autovacuum.h"
#include "storage/bufmgr.h"
+#include "storage/condition_variable.h"
#include "storage/freespace.h"
#include "storage/lmgr.h"
#include "utils/lsyscache.h"
@@ -98,10 +118,72 @@
*/
#define SKIP_PAGES_THRESHOLD ((BlockNumber) 32)
+/* DSM key for parallel lazy vacuum */
+#define VACUUM_KEY_PARALLEL_SCAN 50
+#define VACUUM_KEY_VACUUM_STATS 51
+#define VACUUM_KEY_INDEX_STATS 52
+#define VACUUM_KEY_DEAD_TUPLES 53
+#define VACUUM_KEY_VACUUM_TASK 54
+#define VACUUM_KEY_PARALLEL_STATE 55
+
+/*
+ * see note of lazy_scan_heap_get_nextpage about forcing scanning of
+ * last page
+ */
+#define FORCE_CHECK_PAGE(blk) \
+ (blkno == (blk - 1) && should_attempt_truncation(vacrelstats))
+
+/* Check if given index is assigned to this parallel vacuum worker */
+#define IsAssignedIndex(i_num, nworkers) \
+ (!IsParallelWorker() ||\
+ (IsParallelWorker() && ((i_num) % (nworkers) == ParallelWorkerNumber)))
+
+/* Data structure for updating index relation statistics */
+typedef struct LVIndStats
+{
+ bool do_update; /* Launcher process will update? */
+ BlockNumber rel_pages; /* # of index pages */
+ BlockNumber rel_tuples; /* # of index tuples */
+} LVIndStats;
+
+/* Vacuum worker state for parallel lazy vacuum */
+#define VACSTATE_STARTUP 0x01 /* startup state */
+#define VACSTATE_SCANNING 0x02 /* heap scan phase */
+#define VACSTATE_VACUUM_PREPARED 0x04 /* finished to scan heap */
+#define VACSTATE_VACUUMING 0x08 /* vacuuming on table and index */
+#define VACSTATE_VACUUM_FINISHED 0x10 /* finished to vacuum */
+#define VACSTATE_COMPLETE 0x20 /* complete to vacuum */
+
+/* Vacuum phase for parallel lazy vacuum */
+#define VACPHASE_SCAN\
+ (VACSTATE_SCANNING | VACSTATE_VACUUM_PREPARED)
+#define VACPHASE_VACUUM \
+ (VACSTATE_VACUUM_PREPARED | VACSTATE_VACUUMING | VACSTATE_VACUUM_FINISHED)
+
+typedef struct VacWorker
+{
+ uint8 state; /* current state of vacuum worker */
+ uint32 round; /* current laps */
+} VacWorker;
+
+typedef struct LVParallelState
+{
+ int nworkers; /* # of parallel vacuum workers */
+ ConditionVariable cv; /* condition variable for making synchronization points*/
+ slock_t mutex; /* mutex for vacworkers state */
+ VacWorker vacworkers[FLEXIBLE_ARRAY_MEMBER];
+ /* each vacuum workers state follows */
+} LVParallelState;
+
+typedef struct LVDeadTuple
+{
+ int n_dt; /* # of dead tuple */
+ ItemPointer dt_array; /* NB: Each list is ordered by TID address */
+} LVDeadTuple;
+
typedef struct LVRelStats
{
- /* hasindex = true means two-pass strategy; false means one-pass */
- bool hasindex;
+ int nindexes; /* > 0 means two-pass strategy; = 0 means one-pass */
/* Overall statistics about rel */
BlockNumber old_rel_pages; /* previous value of pg_class.relpages */
BlockNumber rel_pages; /* total number of pages */
@@ -116,15 +198,42 @@ typedef struct LVRelStats
double tuples_deleted;
BlockNumber nonempty_pages; /* actually, last nonempty page + 1 */
/* List of TIDs of tuples we intend to delete */
- /* NB: this list is ordered by TID address */
- int num_dead_tuples; /* current # of entries */
+ LVDeadTuple *dead_tuples;
int max_dead_tuples; /* # slots allocated in array */
- ItemPointer dead_tuples; /* array of ItemPointerData */
int num_index_scans;
TransactionId latestRemovedXid;
bool lock_waiter_detected;
+ /* Fields for parallel lazy vacuum */
+ LVIndStats *vacindstats;
+ LVParallelState *pstate;
} LVRelStats;
+/*
+ * Scan description data for lazy vacuum. In parallel lazy vacuum,
+ * we use only heapscan instead.
+ */
+typedef struct LVScanDescData
+{
+ BlockNumber lv_cblock; /* current scanning block number */
+ BlockNumber lv_next_unskippable_block; /* next block number we cannot skip */
+ BlockNumber lv_nblocks; /* the number blocks of relation */
+ HeapScanDesc heapscan; /* field for parallel lazy vacuum */
+} LVScanDescData;
+typedef struct LVScanDescData *LVScanDesc;
+
+/*
+ * Vacuum relevant options and thresholds we need share with parallel
+ * vacuum workers.
+ */
+typedef struct VacuumTask
+{
+ int options; /* VACUUM optoins */
+ bool aggressive; /* does each worker need to aggressive vacuum? */
+ TransactionId oldestxmin;
+ TransactionId freezelimit;
+ MultiXactId multixactcutoff;
+ int elevel;
+} VacuumTask;
/* A few variables that don't seem worth passing around as parameters */
static int elevel = -1;
@@ -135,11 +244,10 @@ static MultiXactId MultiXactCutoff;
static BufferAccessStrategy vac_strategy;
+static LVDeadTuple *MyDeadTuple = NULL; /* pointer to my dead tuple space */
+static VacWorker *MyVacWorker = NULL; /* pointer to my vacuum worker state */
/* non-export function prototypes */
-static void lazy_scan_heap(Relation onerel, int options,
- LVRelStats *vacrelstats, Relation *Irel, int nindexes,
- bool aggressive);
static void lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats);
static bool lazy_check_needs_freeze(Buffer buf, bool *hastup);
static void lazy_vacuum_index(Relation indrel,
@@ -147,21 +255,54 @@ static void lazy_vacuum_index(Relation indrel,
LVRelStats *vacrelstats);
static void lazy_cleanup_index(Relation indrel,
IndexBulkDeleteResult *stats,
- LVRelStats *vacrelstats);
+ LVRelStats *vacrelstats,
+ LVIndStats *vacindstats);
static int lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
- int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
+ int tupindex, LVRelStats *vacrelstats, Buffer *vmbuffer);
static bool should_attempt_truncation(LVRelStats *vacrelstats);
static void lazy_truncate_heap(Relation onerel, LVRelStats *vacrelstats);
static BlockNumber count_nondeletable_pages(Relation onerel,
LVRelStats *vacrelstats);
static void lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks);
-static void lazy_record_dead_tuple(LVRelStats *vacrelstats,
- ItemPointer itemptr);
+static void lazy_record_dead_tuple(LVRelStats *vacrelstats, ItemPointer itemptr);
static bool lazy_tid_reaped(ItemPointer itemptr, void *state);
static int vac_cmp_itemptr(const void *left, const void *right);
static bool heap_page_is_all_visible(Relation rel, Buffer buf,
TransactionId *visibility_cutoff_xid, bool *all_frozen);
-
+static void lazy_scan_heap(LVRelStats *vacrelstats, Relation onerel,
+ Relation *Irels, int nindexes,ParallelHeapScanDesc pscan,
+ int options, bool aggressive);
+
+/* function prototypes for parallel vacuum */
+static void parallel_lazy_scan_heap(Relation rel, LVRelStats *vacrelstats,
+ int options, bool aggressive, int wnum);
+static void lazy_vacuum_worker(dsm_segment *seg, shm_toc *toc);
+static void lazy_gather_vacuum_stats(ParallelContext *pxct,
+ LVRelStats *valrelstats,
+ LVIndStats *vacindstats);
+static void lazy_update_index_stats(Relation onerel, LVIndStats *vacindstats);
+static void lazy_estimate_dsm(ParallelContext *pcxt, long maxtuples, int nindexes);
+static void lazy_initialize_dsm(ParallelContext *pcxt, Relation onrel,
+ LVRelStats *vacrelstats, int options,
+ bool aggressive);
+static void lazy_initialize_worker(shm_toc *toc, ParallelHeapScanDesc *pscan,
+ LVRelStats **vacrelstats, int *options,
+ bool *aggressive);
+static void lazy_clear_dead_tuple(LVRelStats *vacrelstats);
+static LVScanDesc lv_beginscan(LVRelStats *vacrelstats, ParallelHeapScanDesc pscan,
+ Relation onerel);
+static void lv_endscan(LVScanDesc lvscan);
+static BlockNumber lazy_scan_heap_get_nextpage(Relation onerel, LVRelStats* vacrelstats,
+ LVScanDesc lvscan, bool *all_visible_according_to_vm,
+ Buffer *vmbuffer, int options, bool aggressive);
+static void lazy_set_vacstate_and_wait_prepared(LVParallelState *pstate);
+static void lazy_set_vacstate_and_wait_finished(LVRelStats *vacrelstats);
+static bool lazy_check_vacstate_prepared(LVParallelState *pstate, uint32 round);
+static bool lazy_check_vacstate_finished(LVParallelState *pstate, uint32 round);
+static int lazy_count_vacstate_finished(LVParallelState *pstate, uint32 round, int *n_complete);
+static uint32 lazy_set_my_vacstate(LVParallelState *pstate, uint8 state, bool nextloop,
+ bool broadcast);
+static long lazy_get_max_dead_tuple(LVRelStats *vacrelstats);
/*
* lazy_vacuum_rel() -- perform LAZY VACUUM for one heap relation
@@ -173,7 +314,7 @@ static bool heap_page_is_all_visible(Relation rel, Buffer buf,
* and locked the relation.
*/
void
-lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+lazy_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,
BufferAccessStrategy bstrategy)
{
LVRelStats *vacrelstats;
@@ -205,7 +346,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
starttime = GetCurrentTimestamp();
}
- if (options & VACOPT_VERBOSE)
+ if (options.flags & VACOPT_VERBOSE)
elevel = INFO;
else
elevel = DEBUG2;
@@ -233,7 +374,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
xidFullScanLimit);
aggressive |= MultiXactIdPrecedesOrEquals(onerel->rd_rel->relminmxid,
mxactFullScanLimit);
- if (options & VACOPT_DISABLE_PAGE_SKIPPING)
+ if (options.flags & VACOPT_DISABLE_PAGE_SKIPPING)
aggressive = true;
vacrelstats = (LVRelStats *) palloc0(sizeof(LVRelStats));
@@ -244,15 +385,26 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
vacrelstats->pages_removed = 0;
vacrelstats->lock_waiter_detected = false;
- /* Open all indexes of the relation */
- vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
- vacrelstats->hasindex = (nindexes > 0);
+ if (options.nworkers > 1)
+ {
+ vacrelstats->nindexes = list_length(RelationGetIndexList(onerel));
- /* Do the vacuuming */
- lazy_scan_heap(onerel, options, vacrelstats, Irel, nindexes, aggressive);
+ /* Do the vacuuming in parallel */
+ parallel_lazy_scan_heap(onerel, vacrelstats, options.flags, aggressive,
+ options.nworkers);
+ }
+ else
+ {
+ vac_open_indexes(onerel, RowExclusiveLock, &nindexes, &Irel);
+ vacrelstats->nindexes = nindexes;
- /* Done with indexes */
- vac_close_indexes(nindexes, Irel, NoLock);
+ /* Do the vacuuming */
+ lazy_scan_heap(vacrelstats, onerel, Irel, nindexes, NULL,
+ options.flags, aggressive);
+
+ /* Done with indexes */
+ vac_close_indexes(nindexes, Irel, RowExclusiveLock);
+ }
/*
* Compute whether we actually scanned the all unfrozen pages. If we did,
@@ -319,7 +471,7 @@ lazy_vacuum_rel(Relation onerel, int options, VacuumParams *params,
new_rel_pages,
new_rel_tuples,
new_rel_allvisible,
- vacrelstats->hasindex,
+ (vacrelstats->nindexes != 0),
new_frozen_xid,
new_min_multi,
false);
@@ -428,28 +580,121 @@ vacuum_log_cleanup_info(Relation rel, LVRelStats *vacrelstats)
}
/*
+ * Launch parallel vacuum workers specified by wnum and wait for all vacuum
+ * workers finish. Before launch vacuum worker we initialize dynamic shared memory
+ * and stores relevant data to it. After all workers finished we gather the vacuum
+ * statistics of all vacuum workers.
+ */
+static void
+parallel_lazy_scan_heap(Relation onerel, LVRelStats *vacrelstats,
+ int options, bool aggressive, int wnum)
+{
+ ParallelContext *pcxt;
+ long maxtuples;
+ LVIndStats *vacindstats;
+
+ vacindstats = (LVIndStats *) palloc(sizeof(LVIndStats) * vacrelstats->nindexes);
+
+ EnterParallelMode();
+
+ /* Create parallel context and initialize it */
+ pcxt = CreateParallelContext(lazy_vacuum_worker, wnum);
+
+ /* Estimate DSM size for parallel vacuum */
+ maxtuples = (int) lazy_get_max_dead_tuple(vacrelstats);
+ vacrelstats->max_dead_tuples = maxtuples;
+ lazy_estimate_dsm(pcxt, maxtuples, vacrelstats->nindexes);
+
+ /* Initialize DSM for parallel vacuum */
+ InitializeParallelDSM(pcxt);
+ lazy_initialize_dsm(pcxt, onerel, vacrelstats, options, aggressive);
+
+ /* Launch workers */
+ LaunchParallelWorkers(pcxt);
+
+ /* Wait for workers finished vacuum */
+ WaitForParallelWorkersToFinish(pcxt);
+
+ /* Gather the result of vacuum statistics from all workers */
+ lazy_gather_vacuum_stats(pcxt, vacrelstats, vacindstats);
+
+ /* Now we can compute the new value for pg_class.reltuples */
+ vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+ vacrelstats->rel_pages,
+ vacrelstats->scanned_pages,
+ vacrelstats->scanned_tuples);
+ DestroyParallelContext(pcxt);
+ ExitParallelMode();
+
+ /* After parallel mode, we can update index statistics */
+ lazy_update_index_stats(onerel, vacindstats);
+}
+
+/*
+ * Entry point of parallel vacuum worker.
+ */
+static void
+lazy_vacuum_worker(dsm_segment *seg, shm_toc *toc)
+{
+ ParallelHeapScanDesc pscan;
+ LVRelStats *vacrelstats;
+ int options;
+ bool aggressive;
+ Relation rel;
+ Relation *indrel;
+ int nindexes_worker;
+
+ /* Look up and initialize information and task */
+ lazy_initialize_worker(toc, &pscan, &vacrelstats, &options,
+ &aggressive);
+
+ rel = relation_open(pscan->phs_relid, ShareUpdateExclusiveLock);
+
+ /* Open all indexes */
+ vac_open_indexes(rel, RowExclusiveLock, &nindexes_worker,
+ &indrel);
+
+ /* Do lazy vacuum */
+ lazy_scan_heap(vacrelstats, rel, indrel, vacrelstats->nindexes,
+ pscan, options, aggressive);
+
+ heap_close(rel, ShareUpdateExclusiveLock);
+ vac_close_indexes(vacrelstats->nindexes, indrel, RowExclusiveLock);
+}
+
+/*
* lazy_scan_heap() -- scan an open heap relation
*
* This routine prunes each page in the heap, which will among other
* things truncate dead tuples to dead line pointers, defragment the
- * page, and set commit status bits (see heap_page_prune). It also builds
+ * page, and set commit status bits (see heap_page_prune). It also uses
* lists of dead tuples and pages with free space, calculates statistics
* on the number of live tuples in the heap, and marks pages as
* all-visible if appropriate. When done, or when we run low on space for
- * dead-tuple TIDs, invoke vacuuming of indexes and call lazy_vacuum_heap
- * to reclaim dead line pointers.
+ * dead-tuple TIDs, invoke vacuuming of assigned indexes and call lazy_vacuum_heap
+ * to reclaim dead line pointers. In parallel vacuum, we need to synchronize
+ * at where scanning heap finished and vacuuming heap finished. The vacuum
+ * worker reached to that point first need to wait for other vacuum workers
+ * reached to the same point.
+ *
+ * In parallel lazy scan, pscan is not NULL and we get next page number
+ * using parallel heap scan. We make two synchronization points at where
+ * before reclaiming dead tuple actually and after reclaimed them.
*
* If there are no indexes then we can reclaim line pointers on the fly;
* dead line pointers need only be retained until all index pointers that
* reference them have been killed.
*/
static void
-lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
- Relation *Irel, int nindexes, bool aggressive)
+lazy_scan_heap(LVRelStats *vacrelstats, Relation onerel, Relation *Irel,
+ int nindexes, ParallelHeapScanDesc pscan, int options,
+ bool aggressive)
{
- BlockNumber nblocks,
- blkno;
+ BlockNumber blkno;
+ BlockNumber nblocks;
HeapTupleData tuple;
+ LVScanDesc lvscan;
+ LVIndStats *vacindstats;
char *relname;
BlockNumber empty_pages,
vacuumed_pages;
@@ -461,10 +706,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
int i;
PGRUsage ru0;
Buffer vmbuffer = InvalidBuffer;
- BlockNumber next_unskippable_block;
- bool skipping_blocks;
xl_heap_freeze_tuple *frozen;
StringInfoData buf;
+ bool all_visible_according_to_vm = false;
+
const int initprog_index[] = {
PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_TOTAL_HEAP_BLKS,
@@ -482,11 +727,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
empty_pages = vacuumed_pages = 0;
num_tuples = tups_vacuumed = nkeep = nunused = 0;
+ nblocks = RelationGetNumberOfBlocks(onerel);
indstats = (IndexBulkDeleteResult **)
- palloc0(nindexes * sizeof(IndexBulkDeleteResult *));
+ palloc0(sizeof(IndexBulkDeleteResult *) * nindexes);
- nblocks = RelationGetNumberOfBlocks(onerel);
vacrelstats->rel_pages = nblocks;
vacrelstats->scanned_pages = 0;
vacrelstats->nonempty_pages = 0;
@@ -495,86 +740,24 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
lazy_space_alloc(vacrelstats, nblocks);
frozen = palloc(sizeof(xl_heap_freeze_tuple) * MaxHeapTuplesPerPage);
+ /* Array of index vacuum statistics */
+ vacindstats = vacrelstats->vacindstats;
+
+ /* Begin heap scan for vacuum */
+ lvscan = lv_beginscan(vacrelstats, pscan, onerel);
+
/* Report that we're scanning the heap, advertising total # of blocks */
initprog_val[0] = PROGRESS_VACUUM_PHASE_SCAN_HEAP;
initprog_val[1] = nblocks;
initprog_val[2] = vacrelstats->max_dead_tuples;
pgstat_progress_update_multi_param(3, initprog_index, initprog_val);
- /*
- * Except when aggressive is set, we want to skip pages that are
- * all-visible according to the visibility map, but only when we can skip
- * at least SKIP_PAGES_THRESHOLD consecutive pages. Since we're reading
- * sequentially, the OS should be doing readahead for us, so there's no
- * gain in skipping a page now and then; that's likely to disable
- * readahead and so be counterproductive. Also, skipping even a single
- * page means that we can't update relfrozenxid, so we only want to do it
- * if we can skip a goodly number of pages.
- *
- * When aggressive is set, we can't skip pages just because they are
- * all-visible, but we can still skip pages that are all-frozen, since
- * such pages do not need freezing and do not affect the value that we can
- * safely set for relfrozenxid or relminmxid.
- *
- * Before entering the main loop, establish the invariant that
- * next_unskippable_block is the next block number >= blkno that's not we
- * can't skip based on the visibility map, either all-visible for a
- * regular scan or all-frozen for an aggressive scan. We set it to
- * nblocks if there's no such block. We also set up the skipping_blocks
- * flag correctly at this stage.
- *
- * Note: The value returned by visibilitymap_get_status could be slightly
- * out-of-date, since we make this test before reading the corresponding
- * heap page or locking the buffer. This is OK. If we mistakenly think
- * that the page is all-visible or all-frozen when in fact the flag's just
- * been cleared, we might fail to vacuum the page. It's easy to see that
- * skipping a page when aggressive is not set is not a very big deal; we
- * might leave some dead tuples lying around, but the next vacuum will
- * find them. But even when aggressive *is* set, it's still OK if we miss
- * a page whose all-frozen marking has just been cleared. Any new XIDs
- * just added to that page are necessarily newer than the GlobalXmin we
- * computed, so they'll have no effect on the value to which we can safely
- * set relfrozenxid. A similar argument applies for MXIDs and relminmxid.
- *
- * We will scan the table's last page, at least to the extent of
- * determining whether it has tuples or not, even if it should be skipped
- * according to the above rules; except when we've already determined that
- * it's not worth trying to truncate the table. This avoids having
- * lazy_truncate_heap() take access-exclusive lock on the table to attempt
- * a truncation that just fails immediately because there are tuples in
- * the last page. This is worth avoiding mainly because such a lock must
- * be replayed on any hot standby, where it can be disruptive.
- */
- next_unskippable_block = 0;
- if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
- {
- while (next_unskippable_block < nblocks)
- {
- uint8 vmstatus;
+ lazy_set_my_vacstate(vacrelstats->pstate, VACSTATE_SCANNING, false, false);
- vmstatus = visibilitymap_get_status(onerel, next_unskippable_block,
- &vmbuffer);
- if (aggressive)
- {
- if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
- break;
- }
- else
- {
- if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
- break;
- }
- vacuum_delay_point();
- next_unskippable_block++;
- }
- }
-
- if (next_unskippable_block >= SKIP_PAGES_THRESHOLD)
- skipping_blocks = true;
- else
- skipping_blocks = false;
-
- for (blkno = 0; blkno < nblocks; blkno++)
+ while((blkno = lazy_scan_heap_get_nextpage(onerel, vacrelstats, lvscan,
+ &all_visible_according_to_vm,
+ &vmbuffer, options, aggressive)) !=
+ InvalidBlockNumber)
{
Buffer buf;
Page page;
@@ -585,100 +768,21 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
int prev_dead_count;
int nfrozen;
Size freespace;
- bool all_visible_according_to_vm = false;
bool all_visible;
bool all_frozen = true; /* provided all_visible is also true */
bool has_dead_tuples;
TransactionId visibility_cutoff_xid = InvalidTransactionId;
- /* see note above about forcing scanning of last page */
-#define FORCE_CHECK_PAGE() \
- (blkno == nblocks - 1 && should_attempt_truncation(vacrelstats))
-
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_SCANNED, blkno);
- if (blkno == next_unskippable_block)
- {
- /* Time to advance next_unskippable_block */
- next_unskippable_block++;
- if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
- {
- while (next_unskippable_block < nblocks)
- {
- uint8 vmskipflags;
-
- vmskipflags = visibilitymap_get_status(onerel,
- next_unskippable_block,
- &vmbuffer);
- if (aggressive)
- {
- if ((vmskipflags & VISIBILITYMAP_ALL_FROZEN) == 0)
- break;
- }
- else
- {
- if ((vmskipflags & VISIBILITYMAP_ALL_VISIBLE) == 0)
- break;
- }
- vacuum_delay_point();
- next_unskippable_block++;
- }
- }
-
- /*
- * We know we can't skip the current block. But set up
- * skipping_all_visible_blocks to do the right thing at the
- * following blocks.
- */
- if (next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
- skipping_blocks = true;
- else
- skipping_blocks = false;
-
- /*
- * Normally, the fact that we can't skip this block must mean that
- * it's not all-visible. But in an aggressive vacuum we know only
- * that it's not all-frozen, so it might still be all-visible.
- */
- if (aggressive && VM_ALL_VISIBLE(onerel, blkno, &vmbuffer))
- all_visible_according_to_vm = true;
- }
- else
- {
- /*
- * The current block is potentially skippable; if we've seen a
- * long enough run of skippable blocks to justify skipping it, and
- * we're not forced to check it, then go ahead and skip.
- * Otherwise, the page must be at least all-visible if not
- * all-frozen, so we can set all_visible_according_to_vm = true.
- */
- if (skipping_blocks && !FORCE_CHECK_PAGE())
- {
- /*
- * Tricky, tricky. If this is in aggressive vacuum, the page
- * must have been all-frozen at the time we checked whether it
- * was skippable, but it might not be any more. We must be
- * careful to count it as a skipped all-frozen page in that
- * case, or else we'll think we can't update relfrozenxid and
- * relminmxid. If it's not an aggressive vacuum, we don't
- * know whether it was all-frozen, so we have to recheck; but
- * in this case an approximate answer is OK.
- */
- if (aggressive || VM_ALL_FROZEN(onerel, blkno, &vmbuffer))
- vacrelstats->frozenskipped_pages++;
- continue;
- }
- all_visible_according_to_vm = true;
- }
-
vacuum_delay_point();
/*
* If we are close to overrunning the available space for dead-tuple
* TIDs, pause and do a cycle of vacuuming before we tackle this page.
*/
- if ((vacrelstats->max_dead_tuples - vacrelstats->num_dead_tuples) < MaxHeapTuplesPerPage &&
- vacrelstats->num_dead_tuples > 0)
+ if ((vacrelstats->max_dead_tuples - MyDeadTuple->n_dt) < MaxHeapTuplesPerPage &&
+ MyDeadTuple->n_dt > 0)
{
const int hvp_index[] = {
PROGRESS_VACUUM_PHASE,
@@ -687,6 +791,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
int64 hvp_val[2];
/*
+ * Here scanning heap is done and we are going to reclaim dead
+ * tuples actually. Because other vacuum worker could not finished
+ * yet, we wait for all other workers finish.
+ */
+ lazy_set_vacstate_and_wait_prepared(vacrelstats->pstate);
+
+ /*
* Before beginning index vacuuming, we release any pin we may
* hold on the visibility map page. This isn't necessary for
* correctness, but we do it anyway to avoid holding the pin
@@ -705,11 +816,12 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_VACUUM_INDEX);
- /* Remove index entries */
+ /* Remove assigned index entries */
for (i = 0; i < nindexes; i++)
- lazy_vacuum_index(Irel[i],
- &indstats[i],
- vacrelstats);
+ {
+ if (IsAssignedIndex(i, vacrelstats->pstate->nworkers))
+ lazy_vacuum_index(Irel[i], &indstats[i], vacrelstats);
+ }
/*
* Report that we are now vacuuming the heap. We also increase
@@ -724,17 +836,28 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
/* Remove tuples from heap */
lazy_vacuum_heap(onerel, vacrelstats);
+
+ /* Report that we are once again scanning the heap */
+ pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
+ PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+
/*
* Forget the now-vacuumed tuples, and press on, but be careful
* not to reset latestRemovedXid since we want that value to be
- * valid.
+ * valid. In parallel lazy vacuum, we do that later process.
*/
- vacrelstats->num_dead_tuples = 0;
- vacrelstats->num_index_scans++;
+ if (vacrelstats->pstate == NULL)
+ lazy_clear_dead_tuple(vacrelstats);
- /* Report that we are once again scanning the heap */
- pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
- PROGRESS_VACUUM_PHASE_SCAN_HEAP);
+ /*
+ * Here we've done vacuum on the heap and index and we are going
+ * to begin the next round scan on heap. Wait until all vacuum worker
+ * finished vacuum. After all vacuum workers finished, all of dead
+ * tuple arrays are cleared by a process.
+ */
+ lazy_set_vacstate_and_wait_finished(vacrelstats);
+
+ vacrelstats->num_index_scans++;
}
/*
@@ -760,7 +883,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
* it's OK to skip vacuuming pages we get a lock conflict on. They
* will be dealt with in some future vacuum.
*/
- if (!aggressive && !FORCE_CHECK_PAGE())
+ if (!aggressive && !FORCE_CHECK_PAGE(blkno))
{
ReleaseBuffer(buf);
vacrelstats->pinskipped_pages++;
@@ -911,7 +1034,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
has_dead_tuples = false;
nfrozen = 0;
hastup = false;
- prev_dead_count = vacrelstats->num_dead_tuples;
+ prev_dead_count = MyDeadTuple->n_dt;
maxoff = PageGetMaxOffsetNumber(page);
/*
@@ -1120,10 +1243,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
/*
* If there are no indexes then we can vacuum the page right now
- * instead of doing a second scan.
+ * instead of doing a second scan. Because each parallel worker uses its
+ * own dead tuple area they can vacuum independently.
*/
- if (nindexes == 0 &&
- vacrelstats->num_dead_tuples > 0)
+ if (Irel == NULL && MyDeadTuple->n_dt > 0)
{
/* Remove tuples from heap */
lazy_vacuum_page(onerel, blkno, buf, 0, vacrelstats, &vmbuffer);
@@ -1134,7 +1257,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
* not to reset latestRemovedXid since we want that value to be
* valid.
*/
- vacrelstats->num_dead_tuples = 0;
+ lazy_clear_dead_tuple(vacrelstats);
vacuumed_pages++;
}
@@ -1237,7 +1360,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
* page, so remember its free space as-is. (This path will always be
* taken if there are no indexes.)
*/
- if (vacrelstats->num_dead_tuples == prev_dead_count)
+ if (MyDeadTuple->n_dt == prev_dead_count)
RecordPageWithFreeSpace(onerel, blkno, freespace);
}
@@ -1252,10 +1375,11 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
vacrelstats->new_dead_tuples = nkeep;
/* now we can compute the new value for pg_class.reltuples */
- vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
- nblocks,
- vacrelstats->scanned_pages,
- num_tuples);
+ if (vacrelstats->pstate == NULL)
+ vacrelstats->new_rel_tuples = vac_estimate_reltuples(onerel, false,
+ nblocks,
+ vacrelstats->scanned_pages,
+ num_tuples);
/*
* Release any remaining pin on visibility map page.
@@ -1268,7 +1392,7 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
/* If any tuples need to be deleted, perform final vacuum cycle */
/* XXX put a threshold on min number of tuples here? */
- if (vacrelstats->num_dead_tuples > 0)
+ if (MyDeadTuple->n_dt > 0)
{
const int hvp_index[] = {
PROGRESS_VACUUM_PHASE,
@@ -1276,6 +1400,13 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
};
int64 hvp_val[2];
+ /*
+ * Here, scanning heap is done and going to reclaim dead tuples
+ * actually. Because other vacuum worker might not finished yet,
+ * we need to wait for other workers finish.
+ */
+ lazy_set_vacstate_and_wait_prepared(vacrelstats->pstate);
+
/* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel, vacrelstats);
@@ -1285,9 +1416,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
/* Remove index entries */
for (i = 0; i < nindexes; i++)
- lazy_vacuum_index(Irel[i],
- &indstats[i],
- vacrelstats);
+ {
+ if (IsAssignedIndex(i, vacrelstats->pstate->nworkers))
+ lazy_vacuum_index(Irel[i], &indstats[i], vacrelstats);
+ }
/* Report that we are now vacuuming the heap */
hvp_val[0] = PROGRESS_VACUUM_PHASE_VACUUM_HEAP;
@@ -1297,10 +1429,22 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
/* Remove tuples from heap */
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
PROGRESS_VACUUM_PHASE_VACUUM_HEAP);
+
lazy_vacuum_heap(onerel, vacrelstats);
+
+ /*
+ * Here, we've done to vacuum on heap and going to begin the next
+ * scan on heap. Wait until all vacuum workers finish vacuum.
+ * Once all vacuum workers finished, one of the vacuum worker clears
+ * dead tuple array.
+ */
+ lazy_set_vacstate_and_wait_finished(vacrelstats);
vacrelstats->num_index_scans++;
}
+ /* Change my vacstate to Complete */
+ lazy_set_my_vacstate(vacrelstats->pstate, VACSTATE_COMPLETE, false, true);
+
/* report all blocks vacuumed; and that we're cleaning up */
pgstat_progress_update_param(PROGRESS_VACUUM_HEAP_BLKS_VACUUMED, blkno);
pgstat_progress_update_param(PROGRESS_VACUUM_PHASE,
@@ -1308,7 +1452,10 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
/* Do post-vacuum cleanup and statistics update for each index */
for (i = 0; i < nindexes; i++)
- lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
+ {
+ if (IsAssignedIndex(i, vacrelstats->pstate->nworkers))
+ lazy_cleanup_index(Irel[i], indstats[i], vacrelstats, &vacindstats[i]);
+ }
/* If no indexes, make log report that lazy_vacuum_heap would've made */
if (vacuumed_pages)
@@ -1317,6 +1464,8 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
RelationGetRelationName(onerel),
tups_vacuumed, vacuumed_pages)));
+ lv_endscan(lvscan);
+
/*
* This is pretty messy, but we split it up so that we can skip emitting
* individual parts of the message when not applicable.
@@ -1347,6 +1496,81 @@ lazy_scan_heap(Relation onerel, int options, LVRelStats *vacrelstats,
pfree(buf.data);
}
+/*
+ * gather_vacuum_stats() -- Gather vacuum statistics from workers
+ */
+static void
+lazy_gather_vacuum_stats(ParallelContext *pcxt, LVRelStats *vacrelstats,
+ LVIndStats *vacindstats)
+{
+ int i;
+ LVRelStats *lvstats_list;
+ LVIndStats *lvindstats_list;
+
+ lvstats_list = (LVRelStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_VACUUM_STATS);
+ lvindstats_list = (LVIndStats *) shm_toc_lookup(pcxt->toc, VACUUM_KEY_INDEX_STATS);
+
+ /* Gather each worker stats */
+ for (i = 0; i < pcxt->nworkers; i++)
+ {
+ LVRelStats *wstats = lvstats_list + sizeof(LVRelStats) * i;
+
+ vacrelstats->scanned_pages += wstats->scanned_pages;
+ vacrelstats->pinskipped_pages += wstats->pinskipped_pages;
+ vacrelstats->frozenskipped_pages += wstats->frozenskipped_pages;
+ vacrelstats->scanned_tuples += wstats->scanned_tuples;
+ vacrelstats->new_dead_tuples += wstats->new_dead_tuples;
+ vacrelstats->pages_removed += wstats->pages_removed;
+ vacrelstats->tuples_deleted += wstats->tuples_deleted;
+ vacrelstats->nonempty_pages += wstats->nonempty_pages;
+ }
+
+ /* all vacuum workers have same value of rel_pages */
+ vacrelstats->rel_pages = lvstats_list->rel_pages;
+
+ /* Copy index vacuum statistics on DSM to local memory */
+ memcpy(vacindstats, lvindstats_list, sizeof(LVIndStats) * vacrelstats->nindexes);
+}
+
+/*
+ * lazy_update_index_stats() -- Update index vacuum statistics
+ *
+ * This routine can not be called in parallel context.
+ */
+static void
+lazy_update_index_stats(Relation onerel, LVIndStats *vacindstats)
+{
+ List *indexoidlist;
+ ListCell *indexoidscan;
+ int i;
+
+ indexoidlist = RelationGetIndexList(onerel);
+ i = 0;
+
+ foreach(indexoidscan, indexoidlist)
+ {
+ Oid indexoid = lfirst_oid(indexoidscan);
+ Relation indrel;
+
+ /* Update index relation statistics if needed */
+ if (vacindstats[i].do_update)
+ {
+ indrel = index_open(indexoid, RowExclusiveLock);
+ vac_update_relstats(indrel,
+ vacindstats[i].rel_pages,
+ vacindstats[i].rel_tuples,
+ 0,
+ false,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ false);
+ index_close(indrel, RowExclusiveLock);
+ }
+ i++;
+ }
+
+ list_free(indexoidlist);
+}
/*
* lazy_vacuum_heap() -- second pass over the heap
@@ -1371,7 +1595,8 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
npages = 0;
tupindex = 0;
- while (tupindex < vacrelstats->num_dead_tuples)
+
+ while (tupindex < MyDeadTuple->n_dt)
{
BlockNumber tblk;
Buffer buf;
@@ -1380,7 +1605,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
vacuum_delay_point();
- tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+ tblk = ItemPointerGetBlockNumber(&MyDeadTuple->dt_array[tupindex]);
buf = ReadBufferExtended(onerel, MAIN_FORKNUM, tblk, RBM_NORMAL,
vac_strategy);
if (!ConditionalLockBufferForCleanup(buf))
@@ -1421,7 +1646,7 @@ lazy_vacuum_heap(Relation onerel, LVRelStats *vacrelstats)
*
* Caller must hold pin and buffer cleanup lock on the buffer.
*
- * tupindex is the index in vacrelstats->dead_tuples of the first dead
+ * tupindex is the index in MyDeadTuple->dt_array of the first dead
* tuple for this page. We assume the rest follow sequentially.
* The return value is the first tupindex after the tuples of this page.
*/
@@ -1439,16 +1664,16 @@ lazy_vacuum_page(Relation onerel, BlockNumber blkno, Buffer buffer,
START_CRIT_SECTION();
- for (; tupindex < vacrelstats->num_dead_tuples; tupindex++)
+ for (; tupindex < MyDeadTuple->n_dt; tupindex++)
{
BlockNumber tblk;
OffsetNumber toff;
ItemId itemid;
- tblk = ItemPointerGetBlockNumber(&vacrelstats->dead_tuples[tupindex]);
+ tblk = ItemPointerGetBlockNumber(&MyDeadTuple->dt_array[tupindex]);
if (tblk != blkno)
break; /* past end of tuples for this block */
- toff = ItemPointerGetOffsetNumber(&vacrelstats->dead_tuples[tupindex]);
+ toff = ItemPointerGetOffsetNumber(&MyDeadTuple->dt_array[tupindex]);
itemid = PageGetItemId(page, toff);
ItemIdSetUnused(itemid);
unused[uncnt++] = toff;
@@ -1573,7 +1798,7 @@ lazy_check_needs_freeze(Buffer buf, bool *hastup)
* lazy_vacuum_index() -- vacuum one index relation.
*
* Delete all the index entries pointing to tuples listed in
- * vacrelstats->dead_tuples, and update running statistics.
+ * MyDeadTuple->dt_array, and update running statistics.
*/
static void
lazy_vacuum_index(Relation indrel,
@@ -1582,6 +1807,7 @@ lazy_vacuum_index(Relation indrel,
{
IndexVacuumInfo ivinfo;
PGRUsage ru0;
+ double total_n_dead_tuples = 0;
pg_rusage_init(&ru0);
@@ -1596,10 +1822,25 @@ lazy_vacuum_index(Relation indrel,
*stats = index_bulk_delete(&ivinfo, *stats,
lazy_tid_reaped, (void *) vacrelstats);
+ /* Count total number of scanned tuples during index vacuum */
+ if (vacrelstats->pstate == NULL)
+ total_n_dead_tuples = MyDeadTuple->n_dt;
+ else
+ {
+ int i;
+
+ /*
+ * Since there is no vacuum worker who updates dead tuple during
+ * reclaim phase. We can read them without holding lock.
+ */
+ for (i = 0; i < vacrelstats->pstate->nworkers; i++)
+ total_n_dead_tuples += (vacrelstats->dead_tuples[i]).n_dt;
+ }
+
ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
+ (errmsg("scanned index \"%s\" to remove %0.f row versions",
RelationGetRelationName(indrel),
- vacrelstats->num_dead_tuples),
+ total_n_dead_tuples),
errdetail("%s.", pg_rusage_show(&ru0))));
}
@@ -1609,7 +1850,8 @@ lazy_vacuum_index(Relation indrel,
static void
lazy_cleanup_index(Relation indrel,
IndexBulkDeleteResult *stats,
- LVRelStats *vacrelstats)
+ LVRelStats *vacrelstats,
+ LVIndStats *vacindstats)
{
IndexVacuumInfo ivinfo;
PGRUsage ru0;
@@ -1630,17 +1872,31 @@ lazy_cleanup_index(Relation indrel,
/*
* Now update statistics in pg_class, but only if the index says the count
- * is accurate.
+ * is accurate. In parallel lazy vacuum, the worker can not update these
+ * information by itself, so save to DSM and then the launcher process
+ * updates it later.
*/
if (!stats->estimated_count)
- vac_update_relstats(indrel,
- stats->num_pages,
- stats->num_index_tuples,
- 0,
- false,
- InvalidTransactionId,
- InvalidMultiXactId,
- false);
+ {
+ if (IsParallelWorker())
+ {
+ /* Save to shared memory */
+ vacindstats->do_update = true;
+ vacindstats->rel_pages = stats->num_pages;
+ vacindstats->rel_tuples = stats->num_index_tuples;
+ }
+ else
+ {
+ vac_update_relstats(indrel,
+ stats->num_pages,
+ stats->num_index_tuples,
+ 0,
+ false,
+ InvalidTransactionId,
+ InvalidMultiXactId,
+ false);
+ }
+ }
ereport(elevel,
(errmsg("index \"%s\" now contains %.0f row versions in %u pages",
@@ -1938,62 +2194,102 @@ count_nondeletable_pages(Relation onerel, LVRelStats *vacrelstats)
/*
* lazy_space_alloc - space allocation decisions for lazy vacuum
*
+ * In parallel lazy vacuum the space for dead tuple locations are already
+ * allocated in dynamic shared memory, so we allocate space for dead tuple
+ * locations in local memory only when in not parallel lazy vacuum and set
+ * MyDeadTuple.
+ *
* See the comments at the head of this file for rationale.
*/
static void
lazy_space_alloc(LVRelStats *vacrelstats, BlockNumber relblocks)
{
- long maxtuples;
- int vac_work_mem = IsAutoVacuumWorkerProcess() &&
- autovacuum_work_mem != -1 ?
- autovacuum_work_mem : maintenance_work_mem;
-
- if (vacrelstats->hasindex)
+ /*
+ * If in not parallel lazy vacuum, we need to allocate dead
+ * tuple array in local memory.
+ */
+ if (vacrelstats->pstate == NULL)
{
- maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
- maxtuples = Min(maxtuples, INT_MAX);
- maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
-
- /* curious coding here to ensure the multiplication can't overflow */
- if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks)
- maxtuples = relblocks * LAZY_ALLOC_TUPLES;
+ long maxtuples = lazy_get_max_dead_tuple(vacrelstats);
- /* stay sane if small maintenance_work_mem */
- maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+ vacrelstats->dead_tuples = (LVDeadTuple *) palloc(sizeof(LVDeadTuple));
+ MyDeadTuple = vacrelstats->dead_tuples;
+ MyDeadTuple->dt_array = palloc0(sizeof(ItemPointerData) * (int)maxtuples);
+ vacrelstats->max_dead_tuples = maxtuples;
}
else
{
- maxtuples = MaxHeapTuplesPerPage;
+ /*
+ * In parallel lazy vacuum, we initialize the dead tuple array.
+ * LVDeadTuple array is structed at the beginning of dead_tuples variable,
+ * so remaining space can be used for dead tuple array. The dt_base variable
+ * points to the beginning of dead tuple array.
+ */
+
+ char *dt_base = (char *)vacrelstats->dead_tuples;
+ LVDeadTuple *dt = &(vacrelstats->dead_tuples[ParallelWorkerNumber]);
+
+ /* Adjust dt_base to the beginning of dead tuple array */
+ dt_base += sizeof(LVDeadTuple) * vacrelstats->pstate->nworkers;
+ dt->dt_array = (ItemPointer)
+ (dt_base + sizeof(ItemPointerData) * vacrelstats->max_dead_tuples * ParallelWorkerNumber);
+
+ /* set MyDeadTuple */
+ MyDeadTuple = dt;
}
- vacrelstats->num_dead_tuples = 0;
- vacrelstats->max_dead_tuples = (int) maxtuples;
- vacrelstats->dead_tuples = (ItemPointer)
- palloc(maxtuples * sizeof(ItemPointerData));
+ MyDeadTuple->n_dt = 0;
}
/*
* lazy_record_dead_tuple - remember one deletable tuple
*/
static void
-lazy_record_dead_tuple(LVRelStats *vacrelstats,
- ItemPointer itemptr)
+lazy_record_dead_tuple(LVRelStats *vacrelstats, ItemPointer itemptr)
{
/*
* The array shouldn't overflow under normal behavior, but perhaps it
* could if we are given a really small maintenance_work_mem. In that
* case, just forget the last few tuples (we'll get 'em next time).
*/
- if (vacrelstats->num_dead_tuples < vacrelstats->max_dead_tuples)
+ if (MyDeadTuple->n_dt < vacrelstats->max_dead_tuples)
{
- vacrelstats->dead_tuples[vacrelstats->num_dead_tuples] = *itemptr;
- vacrelstats->num_dead_tuples++;
- pgstat_progress_update_param(PROGRESS_VACUUM_NUM_DEAD_TUPLES,
- vacrelstats->num_dead_tuples);
+ /*
+ * In parallel lzy vacuum, since each parallel vacuum worker has
+ * its own dead tuple array we don't need to do this exclusively.
+ */
+ MyDeadTuple->dt_array[MyDeadTuple->n_dt] = *itemptr;
+ (MyDeadTuple->n_dt)++;
+
+ /* XXX : Update progress information here */
}
}
/*
+ * lazy_clear_dead_tuple() -- clear dead tuple list
+ */
+static void
+lazy_clear_dead_tuple(LVRelStats *vacrelstats)
+{
+ /*
+ * In parallel lazy vacuum one of the parallel worker is responsible
+ * for clearing all dead tuples. Note that we're assumed that only
+ * one process touches the dead tuple array.
+ */
+ if (vacrelstats->pstate != NULL && vacrelstats->nindexes != 0)
+ {
+ int i;
+ for (i = 0; i < vacrelstats->pstate->nworkers; i++)
+ {
+ LVDeadTuple *dead_tuples = &(vacrelstats->dead_tuples[i]);
+ dead_tuples->n_dt = 0;
+ }
+ }
+ else
+ MyDeadTuple->n_dt = 0;
+}
+
+/*
* lazy_tid_reaped() -- is a particular tid deletable?
*
* This has the right signature to be an IndexBulkDeleteCallback.
@@ -2005,14 +2301,33 @@ lazy_tid_reaped(ItemPointer itemptr, void *state)
{
LVRelStats *vacrelstats = (LVRelStats *) state;
ItemPointer res;
+ int i;
+ int num = (vacrelstats->pstate == NULL) ? 1 : vacrelstats->pstate->nworkers;
+
+ /*
+ * In parallel lazy vacuum all dead tuple TID locations are stored into
+ * dynamic shared memory together and entire dead tuple arrays is not
+ * ordered. However since each dead tuple array corresponding vacuum
+ * worker is ordered by TID location we can search 'num' times. Here
+ * since no write happends vacuum worker access the dead tuple array
+ * without holding lock.
+ */
+ for (i = 0; i < num; i++)
+ {
+ ItemPointer dead_tuples = (vacrelstats->dead_tuples[i]).dt_array;
+ int n_tuples = (vacrelstats->dead_tuples[i]).n_dt;
- res = (ItemPointer) bsearch((void *) itemptr,
- (void *) vacrelstats->dead_tuples,
- vacrelstats->num_dead_tuples,
- sizeof(ItemPointerData),
- vac_cmp_itemptr);
+ res = (ItemPointer) bsearch((void *) itemptr,
+ (void *) dead_tuples,
+ n_tuples,
+ sizeof(ItemPointerData),
+ vac_cmp_itemptr);
- return (res != NULL);
+ if (res != NULL)
+ return true;
+ }
+
+ return false;
}
/*
@@ -2156,3 +2471,649 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
return all_visible;
}
+
+/*
+ * Return the block number we need to scan next, or InvalidBlockNumber if scan
+ * is done.
+ *
+ * Except when aggressive is set, we want to skip pages that are
+ * all-visible according to the visibility map, but only when we can skip
+ * at least SKIP_PAGES_THRESHOLD consecutive pages If we're not in parallel
+ * mode. Since we're reading sequentially, the OS should be doing readahead
+ * for us, so there's no gain in skipping a page now and then; that's likely
+ * to disable readahead and so be counterproductive. Also, skipping even a
+ * single page means that we can't update relfrozenxid, so we only want to do it
+ * if we can skip a goodly number of pages.
+ *
+ * When aggressive is set, we can't skip pages just because they are
+ * all-visible, but we can still skip pages that are all-frozen, since
+ * such pages do not need freezing and do not affect the value that we can
+ * safely set for relfrozenxid or relminmxid.
+ *
+ * In not parallel mode, before entering the main loop, establish the
+ * invariant that next_unskippable_block is the next block number >= blkno
+ * that's not we can't skip based on the visibility map, either all-visible
+ * for a regular scan or all-frozen for an aggressive scan. We set it to
+ * nblocks if there's no such block. We also set up the skipping_blocks
+ * flag correctly at this stage.
+ *
+ * In parallel mode, vacrelstats->pstate is not NULL. We scan heap pages
+ * using parallel heap scan description. Each worker calls heap_parallelscan_nextpage()
+ * in order to exclusively get block number we need to scan at next.
+ * If given block is all-visible according to visibility map, we skip to
+ * scan this block immediately unlike not parallel lazy scan.
+ *
+ * Note: The value returned by visibilitymap_get_status could be slightly
+ * out-of-date, since we make this test before reading the corresponding
+ * heap page or locking the buffer. This is OK. If we mistakenly think
+ * that the page is all-visible or all-frozen when in fact the flag's just
+ * been cleared, we might fail to vacuum the page. It's easy to see that
+ * skipping a page when aggressive is not set is not a very big deal; we
+ * might leave some dead tuples lying around, but the next vacuum will
+ * find them. But even when aggressive *is* set, it's still OK if we miss
+ * a page whose all-frozen marking has just been cleared. Any new XIDs
+ * just added to that page are necessarily newer than the GlobalXmin we
+ * computed, so they'll have no effect on the value to which we can safely
+ * set relfrozenxid. A similar argument applies for MXIDs and relminmxid.
+ *
+ * We will scan the table's last page, at least to the extent of
+ * determining whether it has tuples or not, even if it should be skipped
+ * according to the above rules; except when we've already determined that
+ * it's not worth trying to truncate the table. This avoids having
+ * lazy_truncate_heap() take access-exclusive lock on the table to attempt
+ * a truncation that just fails immediately because there are tuples in
+ * the last page. This is worth avoiding mainly because such a lock must
+ * be replayed on any hot standby, where it can be disruptive.
+ */
+static BlockNumber
+lazy_scan_heap_get_nextpage(Relation onerel, LVRelStats *vacrelstats,
+ LVScanDesc lvscan, bool *all_visible_according_to_vm,
+ Buffer *vmbuffer, int options, bool aggressive)
+{
+ BlockNumber blkno;
+
+ if (vacrelstats->pstate != NULL)
+ {
+ /*
+ * In parallel lazy vacuum since it's hard to know how many consecutive
+ * all-visible pages exits on table we skip to scan the heap page immediately.
+ * if it is all-visible page.
+ */
+ while ((blkno = heap_parallelscan_nextpage(lvscan->heapscan)) != InvalidBlockNumber)
+ {
+ *all_visible_according_to_vm = false;
+ vacuum_delay_point();
+
+ /* Consider to skip scan page according visibility map */
+ if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0 &&
+ !FORCE_CHECK_PAGE(blkno))
+ {
+ uint8 vmstatus;
+
+ vmstatus = visibilitymap_get_status(onerel, blkno, vmbuffer);
+
+ if (aggressive)
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+ {
+ vacrelstats->frozenskipped_pages++;
+ continue;
+ }
+ else if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ *all_visible_according_to_vm = true;
+ }
+ else
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) != 0)
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) != 0)
+ vacrelstats->frozenskipped_pages++;
+ continue;
+ }
+ }
+ }
+
+ /* We need to scan current blkno, break */
+ break;
+ }
+ }
+ else
+ {
+ bool skipping_blocks = false;
+
+ /* Initialize lv_nextunskippable_page if needed */
+ if (lvscan->lv_cblock == 0 && (options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+ {
+ while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+ {
+ uint8 vmstatus;
+
+ vmstatus = visibilitymap_get_status(onerel, lvscan->lv_next_unskippable_block,
+ vmbuffer);
+ if (aggressive)
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+ break;
+ }
+ else
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ break;
+ }
+ vacuum_delay_point();
+ lvscan->lv_next_unskippable_block++;
+ }
+
+ if (lvscan->lv_next_unskippable_block >= SKIP_PAGES_THRESHOLD)
+ skipping_blocks = true;
+ else
+ skipping_blocks = false;
+ }
+
+ /* Decide the block number we need to scan */
+ for (blkno = lvscan->lv_cblock; blkno < lvscan->lv_nblocks; blkno++)
+ {
+ if (blkno == lvscan->lv_next_unskippable_block)
+ {
+ /* Time to advance next_unskippable_block */
+ lvscan->lv_next_unskippable_block++;
+ if ((options & VACOPT_DISABLE_PAGE_SKIPPING) == 0)
+ {
+ while (lvscan->lv_next_unskippable_block < lvscan->lv_nblocks)
+ {
+ uint8 vmstatus;
+
+ vmstatus = visibilitymap_get_status(onerel,
+ lvscan->lv_next_unskippable_block,
+ vmbuffer);
+ if (aggressive)
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_FROZEN) == 0)
+ break;
+ }
+ else
+ {
+ if ((vmstatus & VISIBILITYMAP_ALL_VISIBLE) == 0)
+ break;
+ }
+ vacuum_delay_point();
+ lvscan->lv_next_unskippable_block++;
+ }
+ }
+
+ /*
+ * We know we can't skip the current block. But set up
+ * skipping_all_visible_blocks to do the right thing at the
+ * following blocks.
+ */
+ if (lvscan->lv_next_unskippable_block - blkno > SKIP_PAGES_THRESHOLD)
+ skipping_blocks = true;
+ else
+ skipping_blocks = false;
+
+ /*
+ * Normally, the fact that we can't skip this block must mean that
+ * it's not all-visible. But in an aggressive vacuum we know only
+ * that it's not all-frozen, so it might still be all-visible.
+ */
+ if (aggressive && VM_ALL_VISIBLE(onerel, blkno, vmbuffer))
+ *all_visible_according_to_vm = true;
+
+ /* Found out that next unskippable block number */
+ break;
+ }
+ else
+ {
+ /*
+ * The current block is potentially skippable; if we've seen a
+ * long enough run of skippable blocks to justify skipping it, and
+ * we're not forced to check it, then go ahead and skip.
+ * Otherwise, the page must be at least all-visible if not
+ * all-frozen, so we can set all_visible_according_to_vm = true.
+ */
+ if (skipping_blocks && !FORCE_CHECK_PAGE(blkno))
+ {
+ /*
+ * Tricky, tricky. If this is in aggressive vacuum, the page
+ * must have been all-frozen at the time we checked whether it
+ * was skippable, but it might not be any more. We must be
+ * careful to count it as a skipped all-frozen page in that
+ * case, or else we'll think we can't update relfrozenxid and
+ * relminmxid. If it's not an aggressive vacuum, we don't
+ * know whether it was all-frozen, so we have to recheck; but
+ * in this case an approximate answer is OK.
+ */
+ if (aggressive || VM_ALL_FROZEN(onerel, blkno, vmbuffer))
+ vacrelstats->frozenskipped_pages++;
+ continue;
+ }
+
+ *all_visible_according_to_vm = true;
+
+ /* We need to scan current blkno, break */
+ break;
+ }
+ } /* for */
+
+ /* Advance the current block number for the next scan */
+ lvscan->lv_cblock = blkno + 1;
+ }
+
+ return (blkno == lvscan->lv_nblocks) ? InvalidBlockNumber : blkno;
+}
+
+/*
+ * Begin lazy vacuum scan. lvscan->heapscan is NULL if
+ * we're not in parallel lazy vacuum.
+ */
+static LVScanDesc
+lv_beginscan(LVRelStats *vacrelstats, ParallelHeapScanDesc pscan,
+ Relation onerel)
+{
+ LVScanDesc lvscan;
+
+ lvscan = (LVScanDesc) palloc(sizeof(LVScanDescData));
+
+ lvscan->lv_cblock = 0;
+ lvscan->lv_next_unskippable_block = 0;
+ lvscan->lv_nblocks = vacrelstats->rel_pages;
+
+ if (pscan != NULL)
+ lvscan->heapscan = heap_beginscan_parallel(onerel, pscan);
+ else
+ lvscan->heapscan = NULL;
+
+ return lvscan;
+}
+
+/*
+ * End lazy vacuum scan.
+ */
+static void
+lv_endscan(LVScanDesc lvscan)
+{
+ if (lvscan->heapscan != NULL)
+ heap_endscan(lvscan->heapscan);
+ pfree(lvscan);
+}
+
+/* ----------------------------------------------------------------
+ * Parallel Lazy Vacuum Support
+ * ----------------------------------------------------------------
+ */
+
+/*
+ * Estimate storage for parallel lazy vacuum.
+ */
+static void
+lazy_estimate_dsm(ParallelContext *pcxt, long maxtuples, int nindexes)
+{
+ int size = 0;
+ int keys = 0;
+
+ /* Estimate size for parallel heap scan */
+ size += heap_parallelscan_estimate(SnapshotAny);
+ keys++;
+
+ /* Estimate size for vacuum statistics */
+ size += BUFFERALIGN(sizeof(LVRelStats) * pcxt->nworkers);
+ keys++;
+
+ /* Estimate size fo index vacuum statistics */
+ size += BUFFERALIGN(sizeof(LVIndStats) * nindexes);
+ keys++;
+
+ /* Estimate size for dead tuple arrays */
+ size += BUFFERALIGN((sizeof(LVDeadTuple) + sizeof(ItemPointerData) * maxtuples) * pcxt->nworkers);
+ keys++;
+
+ /* Estimate size for parallel lazy vacuum state */
+ size += BUFFERALIGN(sizeof(LVParallelState) + sizeof(VacWorker) * pcxt->nworkers);
+ keys++;
+
+ /* Estimate size for vacuum task */
+ size += BUFFERALIGN(sizeof(VacuumTask));
+ keys++;
+
+ shm_toc_estimate_chunk(&pcxt->estimator, size);
+ shm_toc_estimate_keys(&pcxt->estimator, keys);
+}
+
+/*
+ * Initialize dynamic shared memory for parallel lazy vacuum. We store
+ * relevant informations of parallel heap scanning, dead tuple array
+ * and vacuum statistics for each worker and some parameters for
+ * lazy vacuum.
+ */
+static void
+lazy_initialize_dsm(ParallelContext *pcxt, Relation onerel,
+ LVRelStats *vacrelstats, int options,
+ bool aggressive)
+{
+ ParallelHeapScanDesc pscan;
+ LVRelStats *lvrelstats;
+ LVIndStats *lvindstats;
+ LVDeadTuple *dead_tuples;
+ LVParallelState *pstate;
+ VacuumTask *vacuum_task;
+ int i;
+ int dead_tuples_size;
+ int pstate_size;
+
+ /* Allocate and initialize DSM for parallel scan description */
+ pscan = (ParallelHeapScanDesc) shm_toc_allocate(pcxt->toc,
+ heap_parallelscan_estimate(SnapshotAny));
+
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_SCAN, pscan);
+ heap_parallelscan_initialize(pscan, onerel, SnapshotAny);
+
+ /* Allocate and initialize DSM for vacuum stats for each worker */
+ lvrelstats = (LVRelStats *)shm_toc_allocate(pcxt->toc,
+ sizeof(LVRelStats) * pcxt->nworkers);
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_STATS, lvrelstats);
+ for (i = 0; i < pcxt->nworkers; i++)
+ {
+ LVRelStats *stats = lvrelstats + sizeof(LVRelStats) * i;
+
+ memcpy(stats, vacrelstats, sizeof(LVRelStats));
+ }
+
+ /* Allocate and initialize DSM for dead tuple array */
+ dead_tuples_size = sizeof(LVDeadTuple) * pcxt->nworkers;
+ dead_tuples_size += sizeof(ItemPointerData) * vacrelstats->max_dead_tuples * pcxt->nworkers;
+ dead_tuples = (LVDeadTuple *) shm_toc_allocate(pcxt->toc, dead_tuples_size);
+ vacrelstats->dead_tuples = dead_tuples;
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_DEAD_TUPLES, dead_tuples);
+
+ /* Allocate DSM for index vacuum statistics */
+ lvindstats = (LVIndStats *) shm_toc_allocate(pcxt->toc,
+ sizeof(LVIndStats) * vacrelstats->nindexes);
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_INDEX_STATS, lvindstats);
+
+
+ /* Allocate and initialize DSM for parallel state */
+ pstate_size = sizeof(LVParallelState) + sizeof(VacWorker) * pcxt->nworkers;
+ pstate = (LVParallelState *) shm_toc_allocate(pcxt->toc, pstate_size);
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_PARALLEL_STATE, pstate);
+ pstate->nworkers = pcxt->nworkers;
+ ConditionVariableInit(&(pstate->cv));
+ SpinLockInit(&(pstate->mutex));
+
+ /* Allocate and initialize DSM for vacuum task */
+ vacuum_task = (VacuumTask *) shm_toc_allocate(pcxt->toc, sizeof(VacuumTask));
+ shm_toc_insert(pcxt->toc, VACUUM_KEY_VACUUM_TASK, vacuum_task);
+ vacuum_task->aggressive = aggressive;
+ vacuum_task->options = options;
+ vacuum_task->oldestxmin = OldestXmin;
+ vacuum_task->freezelimit = FreezeLimit;
+ vacuum_task->multixactcutoff = MultiXactCutoff;
+ vacuum_task->elevel = elevel;
+}
+
+/*
+ * Initialize parallel lazy vacuum for worker.
+ */
+static void
+lazy_initialize_worker(shm_toc *toc, ParallelHeapScanDesc *pscan,
+ LVRelStats **vacrelstats, int *options,
+ bool *aggressive)
+{
+ LVRelStats *lvstats;
+ LVIndStats *vacindstats;
+ VacuumTask *vacuum_task;
+ LVDeadTuple *dead_tuples;
+ LVParallelState *pstate;
+
+ /* Set up parallel heap scan description */
+ *pscan = (ParallelHeapScanDesc) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_SCAN);
+
+ /* Set up vacuum stats */
+ lvstats = (LVRelStats *) shm_toc_lookup(toc, VACUUM_KEY_VACUUM_STATS);
+ *vacrelstats = lvstats + sizeof(LVRelStats) * ParallelWorkerNumber;
+
+ /* Set up vacuum index statistics */
+ vacindstats = (LVIndStats *) shm_toc_lookup(toc, VACUUM_KEY_INDEX_STATS);
+ (*vacrelstats)->vacindstats = (LVIndStats *)vacindstats;
+
+ /* Set up dead tuple list */
+ dead_tuples = (LVDeadTuple *) shm_toc_lookup(toc, VACUUM_KEY_DEAD_TUPLES);
+ (*vacrelstats)->dead_tuples = dead_tuples;
+
+ /* Set up vacuum task */
+ vacuum_task = (VacuumTask *) shm_toc_lookup(toc, VACUUM_KEY_VACUUM_TASK);
+
+ /* Set up parallel vacuum state */
+ pstate = (LVParallelState *) shm_toc_lookup(toc, VACUUM_KEY_PARALLEL_STATE);
+ (*vacrelstats)->pstate = pstate;
+ MyVacWorker = &(pstate->vacworkers[ParallelWorkerNumber]);
+ MyVacWorker->state = VACSTATE_STARTUP;
+
+ /* Set up parameters for lazy vacuum */
+ OldestXmin = vacuum_task->oldestxmin;
+ FreezeLimit = vacuum_task->freezelimit;
+ MultiXactCutoff = vacuum_task->multixactcutoff;
+ elevel = vacuum_task->elevel;
+ *options = vacuum_task->options;
+ *aggressive = vacuum_task->aggressive;
+}
+
+/*
+ * Set my vacuum state exclusively and wait until all vacuum workers
+ * finish vacuum.
+ */
+static void
+lazy_set_vacstate_and_wait_finished(LVRelStats *vacrelstats)
+{
+ LVParallelState *pstate = vacrelstats->pstate;
+ uint32 round;
+ int n_count, n_comp;
+
+ /* Exit if in not parallel vacuum */
+ if (pstate == NULL)
+ return;
+
+ SpinLockAcquire(&(pstate->mutex));
+
+ /* Change my vacstate */
+ round = MyVacWorker->round;
+ MyVacWorker->state = VACSTATE_VACUUM_FINISHED;
+
+ /* Check all vacuum worker states */
+ n_count = lazy_count_vacstate_finished(pstate, round, &n_comp);
+
+ /*
+ * If I'm a last running worker that has reached to here then clear
+ * dead tuple. Note that clearing dead tuple array must be done
+ * by only one worker and during holding lock.
+ */
+ if ((n_count + n_comp) == pstate->nworkers)
+ lazy_clear_dead_tuple(vacrelstats);
+
+ SpinLockRelease(&(pstate->mutex));
+
+ ConditionVariablePrepareToSleep(&(pstate->cv));
+
+ /* Sleep until all of vacuum workers reached here */
+ while (!lazy_check_vacstate_finished(pstate, round))
+ ConditionVariableSleep(&(pstate->cv), WAIT_EVENT_PARALLEL_FINISH);
+
+ ConditionVariableCancelSleep();
+
+ /* For next round scan, change its state and increment round number */
+ lazy_set_my_vacstate(pstate, VACSTATE_SCANNING, true, false);
+}
+
+/*
+ * Set my vacuum state exclusively and wait until all vacuum workers
+ * prepared vacuum.
+ */
+static void
+lazy_set_vacstate_and_wait_prepared(LVParallelState *pstate)
+{
+ uint32 round;
+
+ /* Exit if in not parallel vacuum */
+ if (pstate == NULL)
+ return;
+
+ /* update my vacstate */
+ round = lazy_set_my_vacstate(pstate, VACSTATE_VACUUM_PREPARED, false, true);
+
+ ConditionVariablePrepareToSleep(&(pstate->cv));
+
+ /* Sleep until all of vacuum workers reached here */
+ while (!lazy_check_vacstate_prepared(pstate, round))
+ ConditionVariableSleep(&(pstate->cv), WAIT_EVENT_PARALLEL_FINISH);
+
+ ConditionVariableCancelSleep();
+
+ /* For next round scan, change its state */
+ lazy_set_my_vacstate(pstate, VACSTATE_VACUUMING, false, false);
+}
+
+/*
+ * Set my vacstate. After set state we increment its round and notice other
+ * waiting process if required. Return current its round number.
+ */
+static uint32
+lazy_set_my_vacstate(LVParallelState *pstate, uint8 state, bool nextloop,
+ bool broadcast)
+{
+ uint32 round;
+
+ /* Quick exit if in not parallel vacuum */
+ if (pstate == NULL)
+ return 0;
+
+ Assert(IsParallelWorker());
+
+ SpinLockAcquire(&(pstate->mutex));
+
+ MyVacWorker->state = state;
+ round = MyVacWorker->round;
+
+ /* Increment its round number */
+ if (nextloop)
+ (MyVacWorker->round)++;
+
+ SpinLockRelease(&(pstate->mutex));
+
+ /* Notice other waiting vacuum worker */
+ if (broadcast)
+ ConditionVariableBroadcast(&(pstate->cv));
+
+ return round;
+}
+
+/*
+ * Check if all vacuum workers have finished to scan heap and prepared to
+ * reclaim dead tuple. Return true if all vacuum workers have prepared.
+ * Otherwise return false.
+ */
+static bool
+lazy_check_vacstate_prepared(LVParallelState *pstate, uint32 round)
+{
+ int n_count = 0;
+ int n_comp = 0;
+ int i;
+
+ SpinLockAcquire(&(pstate->mutex));
+
+ /*
+ * Count vacuum workers who are in coutable_state on same round and
+ * who are in VACSTATE_COMPLETE state.
+ */
+ for (i = 0; i < pstate->nworkers; i++)
+ {
+ VacWorker *vacworker = &(pstate->vacworkers[i]);
+ uint32 w_round = vacworker->round;
+
+ if ((vacworker->state & VACPHASE_VACUUM) != 0 && w_round == round)
+ n_count++;
+ else if (vacworker->state == VACSTATE_COMPLETE)
+ n_comp++;
+ }
+
+ SpinLockRelease(&(pstate->mutex));
+
+ return (n_count + n_comp) == pstate->nworkers;
+}
+
+/*
+ * Check if all vacuum workers have finished vacuum on table and index.
+ * Return true if all vacuum workers have finished. Otherwise return false.
+ */
+static bool
+lazy_check_vacstate_finished(LVParallelState *pstate, uint32 round)
+{
+ int n_count, n_comp;
+
+ SpinLockAcquire(&(pstate->mutex));
+ n_count = lazy_count_vacstate_finished(pstate, round, &n_comp);
+ SpinLockRelease(&(pstate->mutex));
+
+ return (n_count + n_comp) == pstate->nworkers;
+}
+
+/*
+ * When counting the number of vacuum worker who has finished to vacuum
+ * on table and index, some vacuum workers could proceed to subsequent
+ * state on next round. We count the number of vacuum worker who is in the same
+ * state or is in subsequent state on next round. Caller must hold mutex lock.
+ */
+static int
+lazy_count_vacstate_finished(LVParallelState *pstate, uint32 round, int *n_complete)
+{
+ int n_count = 0;
+ int n_comp = 0;
+ int i;
+
+ for (i = 0; i < pstate->nworkers; i++)
+ {
+ VacWorker *vacworker = &(pstate->vacworkers[i]);
+ uint32 w_round = vacworker->round;
+
+ if (((vacworker->state & VACSTATE_VACUUM_FINISHED) != 0 && w_round == round) ||
+ ((vacworker->state & VACPHASE_SCAN) != 0 && w_round == (round + 1)))
+ n_count++;
+ else if (vacworker->state == VACSTATE_COMPLETE)
+ n_comp++;
+ }
+
+ *n_complete = n_comp;
+
+ return n_count;
+}
+
+/*
+ * Return the number of maximum dead tuples can be stored according
+ * to vac_work_mem.
+ */
+static long
+lazy_get_max_dead_tuple(LVRelStats *vacrelstats)
+{
+ long maxtuples;
+ int vac_work_mem = IsAutoVacuumWorkerProcess() &&
+ autovacuum_work_mem != -1 ?
+ autovacuum_work_mem : maintenance_work_mem;
+
+ if (vacrelstats->nindexes != 0)
+ {
+ maxtuples = (vac_work_mem * 1024L) / sizeof(ItemPointerData);
+ maxtuples = Min(maxtuples, INT_MAX);
+ maxtuples = Min(maxtuples, MaxAllocSize / sizeof(ItemPointerData));
+
+ /* curious coding here to ensure the multiplication can't overflow */
+ if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > vacrelstats->old_rel_pages)
+ maxtuples = vacrelstats->old_rel_pages * LAZY_ALLOC_TUPLES;
+
+ /* stay sane if small maintenance_work_mem */
+ maxtuples = Max(maxtuples, MaxHeapTuplesPerPage);
+ }
+ else
+ {
+ maxtuples = MaxHeapTuplesPerPage;
+ }
+
+ return maxtuples;
+}
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index a27e5ed..c3bf0d9 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1595,7 +1595,12 @@ _equalDropdbStmt(const DropdbStmt *a, const DropdbStmt *b)
static bool
_equalVacuumStmt(const VacuumStmt *a, const VacuumStmt *b)
{
- COMPARE_SCALAR_FIELD(options);
+ if (a->options.flags != b->options.flags)
+ return false;
+
+ if (a->options.nworkers != b->options.nworkers)
+ return false;
+
COMPARE_NODE_FIELD(relation);
COMPARE_NODE_FIELD(va_cols);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 9eef550..dcf0353 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -178,6 +178,7 @@ static void processCASbits(int cas_bits, int location, const char *constrType,
bool *deferrable, bool *initdeferred, bool *not_valid,
bool *no_inherit, core_yyscan_t yyscanner);
static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
+static VacuumOptions *makeVacOpt(VacuumOption opt, int nworkers);
%}
@@ -228,6 +229,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
struct ImportQual *importqual;
InsertStmt *istmt;
VariableSetStmt *vsetstmt;
+ VacuumOptions *vacopts;
PartitionElem *partelem;
PartitionSpec *partspec;
PartitionRangeDatum *partrange_datum;
@@ -292,7 +294,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
create_extension_opt_item alter_extension_opt_item
%type <ival> opt_lock lock_type cast_context
-%type <ival> vacuum_option_list vacuum_option_elem
+%type <vacopts> vacuum_option_list vacuum_option_elem
%type <boolean> opt_or_replace
opt_grant_grant_option opt_grant_admin_option
opt_nowait opt_if_exists opt_with_data
@@ -9720,47 +9722,59 @@ cluster_index_specification:
VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
{
VacuumStmt *n = makeNode(VacuumStmt);
- n->options = VACOPT_VACUUM;
+ VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
if ($2)
- n->options |= VACOPT_FULL;
+ vacopts->flags |= VACOPT_FULL;
if ($3)
- n->options |= VACOPT_FREEZE;
+ vacopts->flags |= VACOPT_FREEZE;
if ($4)
- n->options |= VACOPT_VERBOSE;
+ vacopts->flags |= VACOPT_VERBOSE;
+
+ n->options.flags = vacopts->flags;
+ n->options.nworkers = 1;
n->relation = NULL;
n->va_cols = NIL;
$$ = (Node *)n;
+ pfree(vacopts);
}
| VACUUM opt_full opt_freeze opt_verbose qualified_name
{
VacuumStmt *n = makeNode(VacuumStmt);
- n->options = VACOPT_VACUUM;
+ VacuumOptions *vacopts = makeVacOpt(VACOPT_VACUUM, 1);
if ($2)
- n->options |= VACOPT_FULL;
+ vacopts->flags |= VACOPT_FULL;
if ($3)
- n->options |= VACOPT_FREEZE;
+ vacopts->flags |= VACOPT_FREEZE;
if ($4)
- n->options |= VACOPT_VERBOSE;
+ vacopts->flags |= VACOPT_VERBOSE;
+
+ n->options.flags = vacopts->flags;
+ n->options.nworkers = 1;
n->relation = $5;
n->va_cols = NIL;
$$ = (Node *)n;
+ pfree(vacopts);
}
| VACUUM opt_full opt_freeze opt_verbose AnalyzeStmt
{
VacuumStmt *n = (VacuumStmt *) $5;
- n->options |= VACOPT_VACUUM;
+ n->options.flags |= VACOPT_VACUUM;
if ($2)
- n->options |= VACOPT_FULL;
+ n->options.flags |= VACOPT_FULL;
if ($3)
- n->options |= VACOPT_FREEZE;
+ n->options.flags |= VACOPT_FREEZE;
if ($4)
- n->options |= VACOPT_VERBOSE;
+ n->options.flags |= VACOPT_VERBOSE;
+ n->options.nworkers = 1;
$$ = (Node *)n;
}
| VACUUM '(' vacuum_option_list ')'
{
VacuumStmt *n = makeNode(VacuumStmt);
- n->options = VACOPT_VACUUM | $3;
+ VacuumOptions *vacopts = $3;
+
+ n->options.flags = vacopts->flags | VACOPT_VACUUM;
+ n->options.nworkers = vacopts->nworkers;
n->relation = NULL;
n->va_cols = NIL;
$$ = (Node *) n;
@@ -9768,29 +9782,52 @@ VacuumStmt: VACUUM opt_full opt_freeze opt_verbose
| VACUUM '(' vacuum_option_list ')' qualified_name opt_name_list
{
VacuumStmt *n = makeNode(VacuumStmt);
- n->options = VACOPT_VACUUM | $3;
+ VacuumOptions *vacopts = $3;
+
+ n->options.flags = vacopts->flags | VACOPT_VACUUM;
+ n->options.nworkers = vacopts->nworkers;
n->relation = $5;
n->va_cols = $6;
if (n->va_cols != NIL) /* implies analyze */
- n->options |= VACOPT_ANALYZE;
+ n->options.flags |= VACOPT_ANALYZE;
$$ = (Node *) n;
}
;
vacuum_option_list:
vacuum_option_elem { $$ = $1; }
- | vacuum_option_list ',' vacuum_option_elem { $$ = $1 | $3; }
+ | vacuum_option_list ',' vacuum_option_elem
+ {
+ VacuumOptions *vacopts1 = (VacuumOptions *)$1;
+ VacuumOptions *vacopts2 = (VacuumOptions *)$3;
+
+ vacopts1->flags |= vacopts2->flags;
+ if (vacopts1->nworkers < vacopts2->nworkers)
+ vacopts1->nworkers = vacopts2->nworkers;
+
+ $$ = vacopts1;
+ pfree(vacopts2);
+ }
;
vacuum_option_elem:
- analyze_keyword { $$ = VACOPT_ANALYZE; }
- | VERBOSE { $$ = VACOPT_VERBOSE; }
- | FREEZE { $$ = VACOPT_FREEZE; }
- | FULL { $$ = VACOPT_FULL; }
+ analyze_keyword { $$ = makeVacOpt(VACOPT_ANALYZE, 1); }
+ | VERBOSE { $$ = makeVacOpt(VACOPT_VERBOSE, 1); }
+ | FREEZE { $$ = makeVacOpt(VACOPT_FREEZE, 1); }
+ | FULL { $$ = makeVacOpt(VACOPT_FULL, 1); }
+ | PARALLEL ICONST
+ {
+ if ($2 < 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("parallel vacuum degree must be more than 1"),
+ parser_errposition(@1)));
+ $$ = makeVacOpt(VACOPT_PARALLEL, $2);
+ }
| IDENT
{
if (strcmp($1, "disable_page_skipping") == 0)
- $$ = VACOPT_DISABLE_PAGE_SKIPPING;
+ $$ = makeVacOpt(VACOPT_DISABLE_PAGE_SKIPPING, 1);
else
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -9798,27 +9835,36 @@ vacuum_option_elem:
parser_errposition(@1)));
}
;
-
AnalyzeStmt:
analyze_keyword opt_verbose
{
VacuumStmt *n = makeNode(VacuumStmt);
- n->options = VACOPT_ANALYZE;
+ VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
if ($2)
- n->options |= VACOPT_VERBOSE;
+ vacopts->flags |= VACOPT_VERBOSE;
+
+ n->options.flags = vacopts->flags;
+ n->options.nworkers = 1;
n->relation = NULL;
n->va_cols = NIL;
$$ = (Node *)n;
+ pfree(vacopts);
}
| analyze_keyword opt_verbose qualified_name opt_name_list
{
VacuumStmt *n = makeNode(VacuumStmt);
- n->options = VACOPT_ANALYZE;
+ VacuumOptions *vacopts = makeVacOpt(VACOPT_ANALYZE, 1);
+
if ($2)
- n->options |= VACOPT_VERBOSE;
+ vacopts->flags |= VACOPT_VERBOSE;
+
+ n->options.flags = vacopts->flags;
+ n->options.nworkers = 1;
n->relation = $3;
n->va_cols = $4;
$$ = (Node *)n;
+ pfree(vacopts);
}
;
@@ -15284,6 +15330,16 @@ makeRecursiveViewSelect(char *relname, List *aliases, Node *query)
return (Node *) s;
}
+static VacuumOptions *
+makeVacOpt(VacuumOption opt, int nworkers)
+{
+ VacuumOptions *vacopts = palloc(sizeof(VacuumOptions));
+
+ vacopts->flags = opt;
+ vacopts->nworkers = nworkers;
+ return vacopts;
+}
+
/* parser_init()
* Initialize to parse one query string
*/
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 251b9fe..5cc683f 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -186,7 +186,7 @@ typedef struct av_relation
typedef struct autovac_table
{
Oid at_relid;
- int at_vacoptions; /* bitmask of VacuumOption */
+ VacuumOptions at_vacoptions; /* contains bitmask of VacuumOption */
VacuumParams at_params;
int at_vacuum_cost_delay;
int at_vacuum_cost_limit;
@@ -2414,7 +2414,7 @@ do_autovacuum(void)
* next table in our list.
*/
HOLD_INTERRUPTS();
- if (tab->at_vacoptions & VACOPT_VACUUM)
+ if (tab->at_vacoptions.flags & VACOPT_VACUUM)
errcontext("automatic vacuum of table \"%s.%s.%s\"",
tab->at_datname, tab->at_nspname, tab->at_relname);
else
@@ -2651,10 +2651,11 @@ table_recheck_autovac(Oid relid, HTAB *table_toast_map,
tab = palloc(sizeof(autovac_table));
tab->at_relid = relid;
tab->at_sharedrel = classForm->relisshared;
- tab->at_vacoptions = VACOPT_SKIPTOAST |
+ tab->at_vacoptions.flags = VACOPT_SKIPTOAST |
(dovacuum ? VACOPT_VACUUM : 0) |
(doanalyze ? VACOPT_ANALYZE : 0) |
(!wraparound ? VACOPT_NOWAIT : 0);
+ tab->at_vacoptions.nworkers = 1;
tab->at_params.freeze_min_age = freeze_min_age;
tab->at_params.freeze_table_age = freeze_table_age;
tab->at_params.multixact_freeze_min_age = multixact_freeze_min_age;
@@ -2901,10 +2902,10 @@ autovac_report_activity(autovac_table *tab)
int len;
/* Report the command and possible options */
- if (tab->at_vacoptions & VACOPT_VACUUM)
+ if (tab->at_vacoptions.flags & VACOPT_VACUUM)
snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
"autovacuum: VACUUM%s",
- tab->at_vacoptions & VACOPT_ANALYZE ? " ANALYZE" : "");
+ tab->at_vacoptions.flags & VACOPT_ANALYZE ? " ANALYZE" : "");
else
snprintf(activity, MAX_AUTOVAC_ACTIV_LEN,
"autovacuum: ANALYZE");
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 127dc86..ab435ce 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -654,7 +654,7 @@ standard_ProcessUtility(Node *parsetree,
VacuumStmt *stmt = (VacuumStmt *) parsetree;
/* we choose to allow this during "read only" transactions */
- PreventCommandDuringRecovery((stmt->options & VACOPT_VACUUM) ?
+ PreventCommandDuringRecovery((stmt->options.flags & VACOPT_VACUUM) ?
"VACUUM" : "ANALYZE");
/* forbidden in parallel mode due to CommandIsReadOnly */
ExecVacuum(stmt, isTopLevel);
@@ -2394,7 +2394,7 @@ CreateCommandTag(Node *parsetree)
break;
case T_VacuumStmt:
- if (((VacuumStmt *) parsetree)->options & VACOPT_VACUUM)
+ if (((VacuumStmt *) parsetree)->options.flags & VACOPT_VACUUM)
tag = "VACUUM";
else
tag = "ANALYZE";
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 92afc32..37d6780 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1991,7 +1991,6 @@ EstimateSnapshotSpace(Snapshot snap)
Size size;
Assert(snap != InvalidSnapshot);
- Assert(snap->satisfies == HeapTupleSatisfiesMVCC);
/* We allocate any XID arrays needed in the same palloc block. */
size = add_size(sizeof(SerializedSnapshotData),
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index ee7e05a..712f70e 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -131,6 +131,7 @@ extern Size heap_parallelscan_estimate(Snapshot snapshot);
extern void heap_parallelscan_initialize(ParallelHeapScanDesc target,
Relation relation, Snapshot snapshot);
extern HeapScanDesc heap_beginscan_parallel(Relation, ParallelHeapScanDesc);
+extern BlockNumber heap_parallelscan_nextpage(HeapScanDesc scan);
extern bool heap_fetch(Relation relation, Snapshot snapshot,
HeapTuple tuple, Buffer *userbuf, bool keep_buf,
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 541c2fa..7fecbae 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -15,6 +15,7 @@
#define VACUUM_H
#include "access/htup.h"
+#include "access/heapam.h"
#include "catalog/pg_statistic.h"
#include "catalog/pg_type.h"
#include "nodes/parsenodes.h"
@@ -158,7 +159,7 @@ extern int vacuum_multixact_freeze_table_age;
/* in commands/vacuum.c */
extern void ExecVacuum(VacuumStmt *vacstmt, bool isTopLevel);
-extern void vacuum(int options, RangeVar *relation, Oid relid,
+extern void vacuum(VacuumOptions options, RangeVar *relation, Oid relid,
VacuumParams *params, List *va_cols,
BufferAccessStrategy bstrategy, bool isTopLevel);
extern void vac_open_indexes(Relation relation, LOCKMODE lockmode,
@@ -189,7 +190,7 @@ extern void vac_update_datfrozenxid(void);
extern void vacuum_delay_point(void);
/* in commands/vacuumlazy.c */
-extern void lazy_vacuum_rel(Relation onerel, int options,
+extern void lazy_vacuum_rel(Relation onerel, VacuumOptions options,
VacuumParams *params, BufferAccessStrategy bstrategy);
/* in commands/analyze.c */
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 7ceaa22..d19dad7 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2936,13 +2936,20 @@ typedef enum VacuumOption
VACOPT_FULL = 1 << 4, /* FULL (non-concurrent) vacuum */
VACOPT_NOWAIT = 1 << 5, /* don't wait to get lock (autovacuum only) */
VACOPT_SKIPTOAST = 1 << 6, /* don't process the TOAST table, if any */
- VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7 /* don't skip any pages */
+ VACOPT_DISABLE_PAGE_SKIPPING = 1 << 7, /* don't skip any pages */
+ VACOPT_PARALLEL = 1 << 8 /* do VACUUM parallelly */
} VacuumOption;
+typedef struct VacuumOptions
+{
+ VacuumOption flags; /* OR of VacuumOption flags */
+ int nworkers; /* # of parallel vacuum workers */
+} VacuumOptions;
+
typedef struct VacuumStmt
{
NodeTag type;
- int options; /* OR of VacuumOption flags */
+ VacuumOptions options;
RangeVar *relation; /* single table to process, or NULL */
List *va_cols; /* list of column names, or NIL for all */
} VacuumStmt;
diff --git a/src/test/regress/expected/vacuum.out b/src/test/regress/expected/vacuum.out
index 9b604be..bc83323 100644
--- a/src/test/regress/expected/vacuum.out
+++ b/src/test/regress/expected/vacuum.out
@@ -80,5 +80,6 @@ CONTEXT: SQL function "do_analyze" statement 1
SQL function "wrap_do_analyze" statement 1
VACUUM FULL vactst;
VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
DROP TABLE vaccluster;
DROP TABLE vactst;
diff --git a/src/test/regress/sql/vacuum.sql b/src/test/regress/sql/vacuum.sql
index 7b819f6..46355ec 100644
--- a/src/test/regress/sql/vacuum.sql
+++ b/src/test/regress/sql/vacuum.sql
@@ -61,6 +61,7 @@ VACUUM FULL vaccluster;
VACUUM FULL vactst;
VACUUM (DISABLE_PAGE_SKIPPING) vaccluster;
+VACUUM (PARALLEL 2) vactst;
DROP TABLE vaccluster;
DROP TABLE vactst;
On Fri, Jan 6, 2017 at 2:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
table_size | indexes | parallel_degree | time
------------+---------+-----------------+----------
6.5GB | 0 | 1 | 00:00:14
6.5GB | 0 | 2 | 00:00:02
6.5GB | 0 | 4 | 00:00:02
Those numbers look highly suspect.
Are you sure you're not experiencing caching effects? (ie: maybe you
ran the second and third vacuums after the first, and didn't flush the
page cache, so the table was cached)
6.5GB | 2 | 1 | 00:02:18
6.5GB | 2 | 2 | 00:00:38
6.5GB | 2 | 4 | 00:00:46
...
13GB | 0 | 1 | 00:03:52
13GB | 0 | 2 | 00:00:49
13GB | 0 | 4 | 00:00:50
..
13GB | 2 | 1 | 00:12:42
13GB | 2 | 2 | 00:01:17
13GB | 2 | 4 | 00:02:12
These would also be consistent with caching effects
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.I got some advices at PGConf.ASIA 2016 and started to work on this again.
The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.
How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.I got some advices at PGConf.ASIA 2016 and started to work on this again.
The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?
Thank you for comment.
I had not considered necessity of dead lock detection support. But
because lazy_scan_heap actquires the relation extension lock and
release it before acquiring another extension lock, I guess we don't
need that changes for parallel lazy vacuum. Thought?
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 9 January 2017 at 08:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I had not considered necessity of dead lock detection support.
It seems like a big potential win to scan multiple indexes in parallel.
What do we actually gain from having the other parts of VACUUM execute
in parallel? Does truncation happen faster in parallel? ISTM we might
reduce the complexity of this if there is no substantial gain.
Can you give us some timings for performance of the different phases,
with varying levels of parallelism?
Does the design for collecting dead TIDs use a variable amount of
memory? Does this work negate the other work to allow VACUUM to use >
1GB memory?
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Jan 7, 2017 at 7:18 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
On Fri, Jan 6, 2017 at 2:38 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
table_size | indexes | parallel_degree | time
------------+---------+-----------------+----------
6.5GB | 0 | 1 | 00:00:14
6.5GB | 0 | 2 | 00:00:02
6.5GB | 0 | 4 | 00:00:02Those numbers look highly suspect.
Are you sure you're not experiencing caching effects? (ie: maybe you
ran the second and third vacuums after the first, and didn't flush the
page cache, so the table was cached)6.5GB | 2 | 1 | 00:02:18
6.5GB | 2 | 2 | 00:00:38
6.5GB | 2 | 4 | 00:00:46...
13GB | 0 | 1 | 00:03:52
13GB | 0 | 2 | 00:00:49
13GB | 0 | 4 | 00:00:50..
13GB | 2 | 1 | 00:12:42
13GB | 2 | 2 | 00:01:17
13GB | 2 | 4 | 00:02:12These would also be consistent with caching effects
Since I ran vacuum after updated all pages on table, I thought that
all data are in either shared buffer or OS cache. But anyway, I
measured it at only one time so this result is not accurate. I'll test
again and measure it at some times.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 9, 2017 at 2:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.I got some advices at PGConf.ASIA 2016 and started to work on this again.
The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?Thank you for comment.
I had not considered necessity of dead lock detection support. But
because lazy_scan_heap actquires the relation extension lock and
release it before acquiring another extension lock, I guess we don't
need that changes for parallel lazy vacuum. Thought?
Okay, but it is quite possible that lazy_scan_heap is not able to
acquire the required lock as that is already acquired by another
process (which is not part of group performing Vacuum), then all the
processes in a group might need to run deadlock detector code wherein
multiple places, it has been assumed that group members won't
conflict. As an example, refer code in TopoSort where it is trying to
emit all groupmates together and IIRC, the basis of that part of the
code is groupmates won't conflict with each other and this patch will
break that assumption. I have not looked into the parallel vacuum
patch, but changes in 000_make_group_locking_conflict_extend_lock_v2
doesn't appear to be safe. Even if your parallel vacuum patch doesn't
need any change in deadlock detector, then also as proposed it appears
that changes in locking will behave same for any of the operations
performing relation extension. So in future any parallel operation
(say parallel copy/insert) which involves relation extension lock will
behave similary. Is that okay or are you assuming that the next
person developing any such feature should rethink about this problem
and extends your solution to match his requirement.
What do we actually gain from having the other parts of VACUUM execute
in parallel? Does truncation happen faster in parallel?
I think all CPU intensive operations for heap (like checking of
dead/live rows, processing of dead tuples, etc.) can be faster.
Can you give us some timings for performance of the different phases,
with varying levels of parallelism?
I feel timings depend on the kind of test we perform, for example if
there are many dead rows in heap and there are few indexes on a table,
we might see that the gain for doing parallel heap scan is
substantial.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 10, 2017 at 3:46 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jan 9, 2017 at 2:18 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sat, Jan 7, 2017 at 2:47 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Jan 6, 2017 at 11:08 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Mon, Oct 3, 2016 at 11:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:On Fri, Sep 16, 2016 at 6:56 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yeah, I don't have a good solution for this problem so far.
We might need to improve group locking mechanism for the updating
operation or came up with another approach to resolve this problem.
For example, one possible idea is that the launcher process allocates
vm and fsm enough in advance in order to avoid extending fork relation
by parallel workers, but it's not resolve fundamental problem.I got some advices at PGConf.ASIA 2016 and started to work on this again.
The most big problem so far is the group locking. As I mentioned
before, parallel vacuum worker could try to extend the same visibility
map page at the same time. So we need to make group locking conflict
in some cases, or need to eliminate the necessity of acquiring
extension lock. Attached 000 patch uses former idea, which makes the
group locking conflict between parallel workers when parallel worker
tries to acquire extension lock on same page.How are planning to ensure the same in deadlock detector? Currently,
deadlock detector considers members from same lock group as
non-blocking. If you think we don't need to make any changes in
deadlock detector, then explain why so?Thank you for comment.
I had not considered necessity of dead lock detection support. But
because lazy_scan_heap actquires the relation extension lock and
release it before acquiring another extension lock, I guess we don't
need that changes for parallel lazy vacuum. Thought?Okay, but it is quite possible that lazy_scan_heap is not able to
acquire the required lock as that is already acquired by another
process (which is not part of group performing Vacuum), then all the
processes in a group might need to run deadlock detector code wherein
multiple places, it has been assumed that group members won't
conflict. As an example, refer code in TopoSort where it is trying to
emit all groupmates together and IIRC, the basis of that part of the
code is groupmates won't conflict with each other and this patch will
break that assumption. I have not looked into the parallel vacuum
patch, but changes in 000_make_group_locking_conflict_extend_lock_v2
doesn't appear to be safe. Even if your parallel vacuum patch doesn't
need any change in deadlock detector, then also as proposed it appears
that changes in locking will behave same for any of the operations
performing relation extension. So in future any parallel operation
(say parallel copy/insert) which involves relation extension lock will
behave similary. Is that okay or are you assuming that the next
person developing any such feature should rethink about this problem
and extends your solution to match his requirement.
Thank you for expatiation. I agree that we should support dead lock
detection as well in this patch even if this feature doesn't need that
actually. I'm going to extend 000 patch to support dead lock
detection.
What do we actually gain from having the other parts of VACUUM execute
in parallel? Does truncation happen faster in parallel?I think all CPU intensive operations for heap (like checking of
dead/live rows, processing of dead tuples, etc.) can be faster.
Vacuum on table with no index can be faster as well.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Jan 9, 2017 at 6:01 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On 9 January 2017 at 08:48, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
I had not considered necessity of dead lock detection support.
It seems like a big potential win to scan multiple indexes in parallel.
Does the design for collecting dead TIDs use a variable amount of
memory?
No. Collecting dead TIDs and calculation for max dead tuples are same
as current lazy vacuum. That is, the memory space for dead TIDs is
allocated with fixed size. If parallel lazy vacuum that memory space
is allocated in dynamic shared memory, else is allocated in local
memory.
Does this work negate the other work to allow VACUUM to use >
1GB memory?
Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.
Attached result of performance test with scale factor = 500 and the
test script I used. I measured each test at four times and plot
average of last three execution times to sf_500.png file. When table
has index, vacuum execution time is smallest when number of index and
parallel degree is same.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
sf_500.pngimage/png; name=sf_500.pngDownload
�PNG
IHDR � � ,� )PLTE��� ���� � ��� � ���@ �� ��� �@����@ ��� �` �� `��`� � @��0`��` @@@@� ��`�``�`� � � ` ���@��`��`� `��� � �` �``` @@ @�`� `�``����@ � ���������� ���` ����`��� ��`�@ �@@�����`����� ������� � ��������`` � �� �� �������� � � �� � �� � �@ �@��`��`��� �� ��@��@��`��p����� ������T&�s iIDATx������* �\�����>��
��-T��$�����4�i i��v��� r��7��� r1I����*�$��� � �N��u"�p(�m��������A�r���Z+|��s�m�V�������o�t[{��5v���d���0(�;M�)~ <����8t?(�H�iD�~m���p��x,�e:E����������ag/oSx�:����4�b+����7�s���v��0/�?�Z�������{��<w��73`�����T�-����vNy���v���"`?�Nw)~ <�����o����������� ����g�M�7Y���N!����a�Qp�(4��=y�� ������K��`����hG��0�g
������S�@@ch,F�����J�Bg1B�\��xA*Bi1Bw�VD���;G�;v;��E�����" L�[�pu�1�I#��;
F@S$[����(��r�gB�n�����1�b����t����O>������Cv#�����zG����k�= �[�p�e��9��W,%z�31��#4S!}s}e�h�K�"��-F��{m��2�4}�V�c�^4C�����������j�,�#\�yS� ��h�t�����������%��
�-FX�y$�(�
4F����V��/@���1r��4������
���r�
B2�N�d��}1�K���#��h�-F�. � �b���;v;��E��D@pH����_�`F@4A��i@@PA#s�
�)y�@8AUT�$ �UTA@P%��S���d�1|��i�
U^8��q�"h�y+`�Y����K[�Z�`d������np�����k����#!����.x�����@8�����A����]��A@�&i"z+ ��ls�
����a���! H�kc ���b H� �� ��|I�k ���b �IQ!.@@P%�e����9y�@����tB ����J��� �32���%Oq"B ����J��l\� ~r���!W�JB xA@P%���i/��@p�& !| �"$��6����@w���M��*�:}�� +}0l)��6������oD�r���.�G2�" ���iV�%���#���@����Jf'T�@X�- !6 ��]��@�1f�HUD@�PaBC@B � ��"�?F��@��3�( ��! � H�- �0��������R�ZQ���A@PFS@�`�\�h!��g����*`�� ����>��u#W#��/"A�F@P�N�z�����a�*`���0�H��0���&�4VO��Q[�o�������B`����u.`k+�=i7�'���"k&\����7&V�1�fbfB���X=��3!s�,��4!�vb"`�4!�v�V�$8D����f�#��jQ�X7�iB`�h�a�r��0X7��a�fbV��9\�Ec`��,F��D# ����������X5!X3�0�W�``�X�X12�/�t��af��"W��Au����D�|o���*��)��{Gh��X%r%z��[�w�X-�>���^���{/ ����O�0��z1��&��*����0" !�V� ��U�^f+�������1�",\2B��b`E����A�:��
3�
���`%
C�����&���s@��%�/������ I�����.x+ V�*v�0X% s�� �Z&��s����e+ VA'���Q(����)R�u7o�!�BR�v��q�!`�$K�t����o/ V@h}����6�?B`� ��5T����!VG\�������GN�f[���9`m�^@,�.x-��.F��|� �&"�|+�j_����Q��q0��Lf���D����D�u.�{9�``������a��5����&�����,�b`�$�&�W|@@,���a�f�h�_�^@,��%���&�9��%Gi���qX0�����R�e�sy�_`�,E��
��B F��l/�#4X4�*#�6~��,�<��X0�*��ps��e��x"�8F���U��M�IG*�8�UH}�����>)���O�
���ej�ztc������V��fx�E2�����`<����y�<���0�
8���y�"��7��U���~Y�F������,C��������C�����
L+��'����f2p��,�W��������s=�q��S#z|���eg�0+rR���o:�@=g"'^��!2�������N��8Q
'M��U�gB�������8<�M�"�;2]��h�Mg�qJ4���a0��������ZB������K�L���y|M+u�X���������c()�6����E���H}c�m�h��0�@�A*��`�4L��Uq�g� D(
3P��Yz�"�GS�����4L+8od M�9�T�A@x0
��D_n�������<
�I�aBD���U�� �@��0[0����b�&��+"`�D({
1�=�b �v����&�@,�4L��^�-j���1�z�l��n�GN���Gn
D���
���k'T���S��kGH��>�]��X1r%z��w%z����e#%��.+G.�P@�Q��0
V��T����X4����1 !�fT� ���X2K�����zA@P����"`�|A@,#kA�����ca���2�d<���H91�L���6�od�N��~�$��E@�b8��?K�&�K����\���h`��?I1�@��v�K��3<��Z�g�1��G�]�[����>7�M�7o~�����(�!�_���������p�����c�12����Z���n<�S��m��d����~����� 7!�5�yr���(�9����~y��{`�'�&���Vz��Oq��(�m�n���8�=p�6�/��y���u�`�J��8�w���t�2o�p����v����B����}f��~}�=����<�?���W���!���A��#W}0
f��?��y�d����G5�����s@,LF�x��V{]�W�~��>����
y�
B�Y���}
_rb����s@�4L�����_��h�t����ra������G�a�2�7����3B�Xy��1)�����KH�T����������c��M��G�h�����lG*����*
�7cT@�b�Z4In���j)*�-�0k��U�](�|?�g������a!�+���0��������y3u8a�����1�f�
���S�����7h[X�y3�T1p@�����$�f��L]r�x�7cY@]G�Y��5o��T%d�����-K7o���V9���z���#B������'��S���N�-�=GvF��m�yw �S�������W�y3��l�����[x�������vk��c]@��5���o<���5�|�/��y���zG�����0[��~���7Na^�c�0~,����Kz��_2Y���YY��%�T�/�{���������>�%z��%z9���:����N~�A�r�@@��{*���s���6�b����b�p���4�@�cg��>�#�~"z ��}j����c�������v������:F��a�y>!���
�b��������8��(����&J�+6��+���gl��H># N��+���P>" NJ����X>! N�����`� �*� ����
�*� ����
�*� �����\}��6Mq"(�"�M��lP.B���G�v��jDC����n��������Q ����P,$�AU�X��H;���Pi# �
��`� \���O�A����`B!lPI�`!,����D4��z��`a�p +�����?����#�"`��p�'!�M������#���4�s�1.��$��clz�L��H��������'|��\���"�S�-`�[����O�B����S|
�P��� 3��"�\z�_ �QP�;'��� /�P�9��;j�� �\��0��h�����L��Ds(CM�9i�%z���-�p�h����2����t� ������������"m����mM�'��!��TKks��U��D
O���5��G�1rm�7�t���'������%���hc�&/����I�)�Yz��wl����&�`��E�M!���R|n�]�xsr���F���56�(����,�+� @0���v��9'�zjz~i���/��3P��8�� �a1h�dK@k���������}*,�o� �8}����n��}r}����; ?���������=�����k�r�7�/���ps�9���[<�[;�}� '� �{�DB8������8����3���C!���B� ]���OU�+ IEND�B`�On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Attached result of performance test with scale factor = 500 and the
test script I used. I measured each test at four times and plot
average of last three execution times to sf_500.png file. When table
has index, vacuum execution time is smallest when number of index and
parallel degree is same.
It does seem from those results that parallel heap scans aren't paying
off, and in fact are hurting.
It could be your I/O that's at odds with the parallel degree settings
rather than the approach (ie: your I/O system can't handle that many
parallel scans), but in any case it does warrant a few more tests.
I'd suggest you try to:
1. Disable parallel lazy vacuum, leave parallel index scans
2. Limit parallel degree to number of indexes, leaving parallel lazy
vacuum enabled
3. Cap lazy vacuum parallel degree by effective_io_concurrency, and
index scan parallel degree to number of indexes
And compare against your earlier test results.
I suspect 1 could be the winner, but 3 has a chance too (if e_i_c is
properly set up for your I/O system).
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Does this work negate the other work to allow VACUUM to use >
1GB memory?Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.
I think it would work well with DSA as well.
Just instead of having a single segment list, you'd have one per worker.
Since workers work on disjoint tid sets, that shouldn't pose a problem.
The segment list can be joined together later rather efficiently
(simple logical joining of the segment pointer arrays) for the index
scan phases.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 1/10/17 11:23 AM, Claudio Freire wrote:
On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Does this work negate the other work to allow VACUUM to use >
1GB memory?Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.I think it would work well with DSA as well.
Just instead of having a single segment list, you'd have one per worker.
Since workers work on disjoint tid sets, that shouldn't pose a problem.
The segment list can be joined together later rather efficiently
(simple logical joining of the segment pointer arrays) for the index
scan phases.
It's been a while since there was any movement on this patch and quite a
few issues have been raised.
Have you tried the approaches that Claudio suggested?
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 3, 2017 at 11:01 PM, David Steele <david@pgmasters.net> wrote:
On 1/10/17 11:23 AM, Claudio Freire wrote:
On Tue, Jan 10, 2017 at 6:42 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Does this work negate the other work to allow VACUUM to use >
1GB memory?Partly yes. Because memory space for dead TIDs needs to be allocated
in DSM before vacuum worker launches, parallel lazy vacuum cannot use
such a variable amount of memory as that work does. But in
non-parallel lazy vacuum, that work would be effective. We might be
able to do similar thing using DSA but I'm not sure that is better.I think it would work well with DSA as well.
Just instead of having a single segment list, you'd have one per worker.
Since workers work on disjoint tid sets, that shouldn't pose a problem.
The segment list can be joined together later rather efficiently
(simple logical joining of the segment pointer arrays) for the index
scan phases.It's been a while since there was any movement on this patch and quite a
few issues have been raised.Have you tried the approaches that Claudio suggested?
Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.
I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.
From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.
Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 3/4/17 9:08 PM, Masahiko Sawada wrote:
On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.
I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.
--
-David
david@pgmasters.net
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote:
On 3/4/17 9:08 PM, Masahiko Sawada wrote:
On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.
Thank you!
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On Sun, Mar 5, 2017 at 4:09 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
On Sun, Mar 5, 2017 at 12:14 PM, David Steele <david@pgmasters.net> wrote:
On 3/4/17 9:08 PM, Masahiko Sawada wrote:
On Sat, Mar 4, 2017 at 5:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 3, 2017 at 9:50 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
Yes, it's taking a time to update logic and measurement but it's
coming along. Also I'm working on changing deadlock detection. Will
post new patch and measurement result.I think that we should push this patch out to v11. I think there are
too many issues here to address in the limited time we have remaining
this cycle, and I believe that if we try to get them all solved in the
next few weeks we're likely to end up getting backed into some choices
by time pressure that we may later regret bitterly. Since I created
the deadlock issues that this patch is facing, I'm willing to try to
help solve them, but I think it's going to require considerable and
delicate surgery, and I don't think doing that under time pressure is
a good idea.From a fairness point of view, a patch that's not in reviewable shape
on March 1st should really be pushed out, and we're several days past
that.Agreed. There are surely some rooms to discuss about the design yet,
and it will take long time. it's good to push this out to CF2017-07.
Thank you for the comment.I have marked this patch "Returned with Feedback." Of course you are
welcome to submit this patch to the 2017-07 CF, or whenever you feel it
is ready.Thank you!
I re-considered the basic design of parallel lazy vacuum. I didn't
change the basic concept of this feature and usage, the lazy vacuum
still executes with some parallel workers. In current design, dead
tuple TIDs are shared with all vacuum workers including leader process
when table has index. If we share dead tuple TIDs, we have to make two
synchronization points: before starting vacuum and before clearing
dead tuple TIDs. Before starting vacuum we have to make sure that the
dead tuple TIDs are not added no more. And before clearing dead tuple
TIDs we have to make sure that it's used no more.
For index vacuum, each indexes is assigned to a vacuum workers based
on ParallelWorkerNumber. For example, if a table has 5 indexes and
vacuum with 2 workers, the leader process and one vacuum worker are
assigned to 2 indexes, and another vacuum process is assigned the
remaining one. The following steps are how the parallel vacuum
processes if table has indexes.
1. The leader process and workers scan the table in parallel using
ParallelHeapScanDesc, and collect dead tuple TIDs to shared memory.
2. Before vacuum on table, the leader process sort the dead tuple TIDs
in physical order once all workers completes to scan the table.
3. In vacuum on table, the leader process and workers reclaim garbage
on table in block-level parallel.
4. In vacuum on indexes, the indexes on table is assigned to
particular parallel worker or leader process. The process assigned to
a index vacuums on the index.
5. Before back to scanning the table, the leader process clears the
dead tuple TIDs once all workers completes to vacuum on table and
indexes.
Attached the latest patch but it's still PoC version patch and
contains some debug codes. Note that this patch still requires another
patch which moves the relation extension lock out of heavy-weight
lock[1]/messages/by-id/CAD21AoAmdW7eWKi28FkXXd_4fjSdzVDpeH1xYVX7qx=SyqYyJA@mail.gmail.com. The parallel lazy vacuum patch could work even without [1]/messages/by-id/CAD21AoAmdW7eWKi28FkXXd_4fjSdzVDpeH1xYVX7qx=SyqYyJA@mail.gmail.com
patch but could fail during vacuum in some cases.
Also, I attached the result of performance evaluation. The table size
is approximately 300MB ( > shared_buffers) and I deleted tuples on
every blocks before execute vacuum so that vacuum visits every blocks.
The server spec is
* Intel Xeon E5620 @ 2.4Ghz (8cores)
* 32GB RAM
* ioDrive
According to the result of table with indexes, performance of lazy
vacuum improved up to a point where the number of indexes and parallel
degree are the same. If a table has 16 indexes and vacuum with 16
workers, parallel vacuum is 10x faster than single process execution.
Also according to the result of table with no indexes, the parallel
vacuum is 5x faster than single process execution at 8 parallel
degree. Of course we can vacuum only for indexes
I'm planning to work on that in PG11, will register it to next CF.
Comment and feedback are very welcome.
[1]: /messages/by-id/CAD21AoAmdW7eWKi28FkXXd_4fjSdzVDpeH1xYVX7qx=SyqYyJA@mail.gmail.com
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
Attachments:
result_0_to_16.pngimage/png; name=result_0_to_16.pngDownload
�PNG
IHDR � � ,� DPLTE��� ���� � ��� � ���@ �� ��� �@����@ ��� �` �� `��`� � @��0`��` @@@@� ��`�``�`� � � ` ���@��`��`� `��� � �` �``` @@ @�`� `�``����@ � ���������� ���` ����`��� ��`�@ �@@�����`����� ������� � ��������`` � �� �� �������� � � �� � �� � �@ �@��`��`��� �� ��@��@��`��p����� ���������������___???ccc��� ��0 �IDATx��������F��L����y��G���r)�{���0 �����e @J�������*S? �<E>X�E���� (�A���v�� �i������q�z �y�����?R<��P'm+���*�m(��z�/t������41.������!�&�����1uO�W�x���O���ia!��_Q�'����0���r��(������7���&pR��5dC�i<x����j�4]\��g���3�/ �������~y���f �P�U7n�l��l�g���n���R��dY%������ Dak/����2�����k�g�� ������������������_@@�a�.��}��{m9?[���V��W����rJ�E@`��+�������w�������! �f�l�Ai+�f��%��������� �Fo��~�)��
MH#`ON�{����3IX�b��� `�bu}I�SV��g�*�X�]
6���AUv�>�����X6Y=�N����u�=�������$|��W�{��v�@@��������#`S��0�(�EXw��S�;�!����j��r����� Nh�dX��H�pG>3������T�_��01�I��;���[�q�00X
�vG��-�8B.�����]N�B@0q=j������tG>b@_��>RUM^T�Vv���
��~j������� Q�W8 ��� Dab@_@@�������1/`1.��on��f�t�&���F��,�on�� �!�c��B<t���q�.��pn�^�t��nk/��B�p�k���.�L���.��:��X����� �EC0�csG�q���_���V?{��!C-�/?��g[�];7�UC@L[
F�=�����z���N�����W���������F�2��rp�]>�yB�`������N�����S��oT������>�Sl(�v~P�o}���<Q�es=>*�D[�dL�vr-��n-B�#`�� ��/iiv�[��'�0q�s�S��(b�~�+D����V�tL��Y�#G�����F
���8�_j��c��[ �!�c�;J��[��'jIu�_(��v�t��; ����@Wt�:�mTl�������u���G�F���}��Tl�����|��W�~�/ ���W��@���W�~a& ������2����|�.E�lZ?A���0?��-���^ko�b��o����� V��j'��M�uv�g���v@�Di�����_oG�w�S L�:��^�_oG~��
�7
�Y{�dR����g�kN������=�����M��v�G4*5�R��{7�����%�)����pW��R��@�X�06���.�v��D
;a`Z@@��&�!d �U�_��P�e>����.�O��n)�*���? :Ez����@�����>����h`�
x����;��w���'�7����M�$��=��6W�����'%S���S�'�w�SC�m���]� {�#cU�o��g������V��U��=�\!+��'`�_�y��o-cz�H�NeL��� a-c�*Q:�1}b/`�Q���b-c�"G!��p:��/�2�+����G0'����d�j:a�Lk�!`(�
�eL��������R�?���gAv�1�j����V3��X�����X�d���J�NfL����2�����' `:����J�NfL��.�q� ��i�MW�_�����1��0�(,��!����j�A�@@��<�����.�v��sp2@@��<��� Da)�MW�_���@��A(84d50�(,��!����j�A�@@��<�����.�v�psp"@@��<��� Da)�MW�_���@��A(85d60�(,��!����j�A�@@�KyB�!g�@3=3�9<P4:"�p����VuO6�E�dT
A�D����"�%��7�0�3]+{�j�"DN?�u�Q��������e
0'���E��X�.���:��|z�������S����1��iY�M�J��ntCh1��M�_M��w�����W�k��R���X�s/#M~{
��t����y�����?���~�c.1HM?��eC�o�,{���V���Z��������.e+{�lL�'�����3�]W��� �Z���i�Z�@����b�����������!^'�J�@�,BR������H2��A�[�_���6��� `�D���@��q"�TC �B�]�� Da)�n�D�C ����b�D���`
N �'�
6&����q����@��&2)`QX��?8�U=]od��v�<88c�!�����8c�!0~��g�7� �p����
D�f��Ve���]Y4F��6d00���E���]���:�D���0=uivCL��'��#`��eQ�UQvf7E#L���a:��j��
C@_����nL� `����@�4�|Yf]Z����L�@�r��/J� ��������Sp��A�p�,3��.-twYf�0��y����j��#m\�I4sp�hX��������9�6y�ES��� �A��U>]W���@R����k��]���/���o��')���k�����}9�������Q�)��[��Wq0Y�8eLW&J?fL�� *��T�+����D�������/�(,��u� ��9c�*Q�)c:�X�_4,6�2�bU�(<'+�W���#������Tei��k��c���c�t2O�<a5�DP1�-D�tc&��\�����Sp�=dL��'�F�tK?,�h7[2�k-B������#���s�tU�ty�t�y���l��� ��9c�*Q�<c:�CP1����%c�*Q�<c��@����)�n�D�C �������D���ts"� �nNt6�'�0�zk������x���kgus"B@��x!:�3|3�nN����� �B�m��H&�h���x���k7�]�S ����b���w
��`L�����������p! �b�MC$NM?O�r��c���'�9X��f��JC0'��{D�v����(y�/87�!�"�0n�Z�E�Y�rpm��*�_�� �k<��5������-���������1 G������h+[&L�1`Y�eU�F@h��b�����_[�On���
��8/@�Z'�����[��� `�Cg#U�B���c��"������������������b@��a���
���-@�ZF@��a2��v�p�
�]������n�,C��n��N -��&;
xj��*�_�� �k��
�]
8�A�"D��a0���0���0��_%����M��3�g����z7LU�����}�����B@Y�T�g�]����.n�c2���
�-��]��{S��A���
7\�����AS�&o��"C�A���Z�����)�p�WJ�;��mY7����z��Ka�9��Gire8�U�m��4������)
K�vk�4��)x`I������c�����x��TW�!���E�� 9��@W@qro9�'e�aE�~��1}L�7���&/����v�6�3�o���*��I�bg���X4����e5�X��-O�vf7)_���j��`
���U�_���� ���$50�(,��C0�M��,B�l
�42l��i��0����w�t7��(,����``h���1c�X�����0�)�����t���L���kg,�d��>�������j�Y8<�P��0`M\b| b@��P������U�Y��@�>����� `� R@m0�D�����Q@�b�,���|N�>�(�Fa)��C���\��c�1��A�"�?�\���9�p0�����������0�! 5�2
�1`&b@� (|�1�
d��\;�U��X���N2=X�0@�
~3`fsa`�0���A( |�1���{��@���� `���!6�`���I@@��N��������~���@���s�tU�t2c�I����&���G\�W1`d��U�����g��K���/��,B2gLW�(T$+<�PL�<�����YZ�eRS�&0�%�T��V%�>8�'�U�!r���,4.��e
�:�+8"��J��c�H�������]Z�y��<��}�_�:�'�1������f"?-atSp�XP�(]�1�����'�AV�?=eLW%JWfL�g�P��"n�,\2����2��7]�
���K��2 2��6�`��Or�~��)+��^�z�4�c�/���Z���``�������}:��#�H�@��������o>��� (�����,���t��>0������h|�!!b@)��>}��W�����P�o����}��%od���_�x��B��{�
���� *���(!b@)�V��7�.;��WB@��!��z}�>K-�_@���)����`��o:v[�00��o�P@�f; ���c �R9�O��/a2���@h1
�\�����H!l����|.�?�@����O�
�����{���s��Yx�w_���VS�������K=)x����������l8#e��7��a�a2<��rA�1��!�q$1�~��YhZ�
�#��O�Pb 3x�.G�/�$<�N������`�K�HK�ln�B������`7�`i���w��*��^ ��P�6d;�pB��A��qCw���I�0oX
�1���0�3<\�a�@%����~����P�c�R���@�������u>���KRV�/|���@`f
g�A�3b��}��jX8)�g���
� ����4��
R��A.���'�;K��1)���_'F+`U�ds^�J���!����2 ?��ef_���Y�
�q2_���*Y�a��p]+{�����'�"����\|�OU��;�+��X�8��Ww�jf�<�����5j��_�,�[ZM=e�bX�C@�q"��|��w����=�Q!o�W,h��u�99&�����,e������(��E��"�r��R��2�1��r�1 1����v%������*��_i4�T��������E����g�8�H�����8������� ��%���@���)�����N������t����#�/�����*x����"� ��^�?#_�)k�X�� ���T��������0�']�V�`�}��Q�Xny��:�j,���q�|ZE��}�����k��
��������A-B��.�)�)�.��B<*���?X�D� k��� v�w6
Nas��4$9"�
�b�4k������/�Q����ip
����!��P�0��!����j�A��9�l�����iHr L�vh� ����_@5l�;�� ��9�l�����������e Da)��C�lh�S�_ �Q
��a% LfBA� �
9(�@�r�<(��A�"���`"N�B�T`+�� D�b@��!BAR�x���kg-�PZ��/�8nps31C�x B�����P�����XHZ�r��<7}�z�W�(��;��U�l��r�%���N-�C��������#����O>��%��Q�#}R���I���3�g��O��=4I��^���n��cs�-��!-���P���9E[����A���;b�'������ �)�V@���~�U�E"��y/{��AzxP����� 8C�1 wH2� bG�G�����+Zi]�e�w+��$��W]^�������R������\zw��\r����^�%�����Tp����c���s���������X�I{�������X�M]����"K�g-��{��*df�_w�:�������..�;�.9���C/��c��o���\p������_
N�_JN�_J�����D�G_�6�\@��B6�]v�j��J* �����+���d/�n�7`�����������+��_J�������s����+����f���������, ��C��15����[��4��y�]D�(��)�\�{r����K�����������M���s����+�������\r���dS ~���(������T�k�V�����������c���z��W�cM��o���F��(��q��B�M���_�h�����I��;�:�� ?���H>B���tDo[��y�1�<�r��
$&�>��sI8�C�(��
�_J���N���=�v�������������(��P{g<��k�C�r��^�m%�l<��aV�����we��dR�c���Cf7N��<;�n�����O��#�s���������,=�3�dy�5��J^AL����g�������S��)����o�*�
8�����������qN���0v���
D����UT��!HW��gC����u}q��%���S�������b��6�z+I�����0u��"D�G��&�?q�@:�4yu���;�^!>e�\����Y�^E���M��%���S�����?�r����S��������\:>��,�� ��WU��3q]Wg��3�D�������mC��#mE�9��HV�P��������?��������{���{�m���S�w���������]��X:���j�>9kNR��x�zd�(�����h������!��Ek����K��_�~��������E!{�����jd��/%���%�`� �� P������K\���� t�a�{U��w�{�e9��'